A lot of people might classify this as a “newbie” piece. But, I would say that I’ve seen inefficient data usage plenty of times due to just laziness. Some people might even say “premature optimization is the root of all evil” and just make field sizes way bigger than they need. I like to try to keep my DB lean-and-mean when possible, though. Everyone, myself included, gets lazy once in a while when cranking out some prototype code.

These tips are not necessarily MySQL specific, but I’m using MySQL syntax as an example for the sake of simplicity.

SIGNED vs UNSIGNED

This is probably the easiest habit to practice because you have nothing to lose and everything to gain by using SIGNED /UNSIGNED data types in the right place. There’s a simple question you can ask – will this field ever contain a negative value? If the answer is no, then you want an UNSIGNED data type.

A common mistake that I see is a primary key that is an auto-increment INT starting at zero, yet the type is SIGNED. An UNSIGNED INT can go from 0 to 4294967295 (aprox 4 billion). A SIGNED INT , on the other hand, starts at -2147483648 and goes to 2147483648 (aprox 2 billion). If you use an auto-increment starting at 0, you’ll never touch any of those negative numbers and your Id column will top out at around 2 billion.

Use a Smaller ID Data Type for Small Tables

You don’t usually know how many records your application will eventually store. You don’t want to use too small a data type in the event that your app gets popular. You might tend to use INT for your primary key column just out of habit.

In some situations, though, you do have a pretty good idea of how many records you’ll have and so you don’t need to use an INT as the primary key. For tables that contain things like permissions, system settings, account types, etc you may know that you will never need a huge number of records. You may not even create a user interface to access these tables or insert new records.

In cases like this, I find an UNSIGNED TINYINT is perfect. This will hold values from 0 to 255. Sometimes 255 is more than enough, but still gives you room to grow. You don’t want to do this if you expect to get close to this limit of course. If the data is strictly internal and there’s no user interface, the developer is usually in control of these records.

A lot of people use UNSIGNED INT as their primary key which as I mentioned earlier goes up to over 4 billion records. That is a lot of records! An UNSIGNED MEDIUMINT goes up to 16 million records which is still more than most tables need. An UNSIGNED SMALLINT goes up to 65 thousand records. You don’t want to run out of space, but you can try to optimize a bit for the best performance. Keep in mind that your primary key is the field that will be theoretically used the most for record lookups, so the more efficient you can make it the faster your app will run.

Read more about numerical datatypes on the MySQL site…

Know the Difference Between CHAR and VARCHAR

CHAR and VARCHAR both hold string/text data. However they work a little differently because CHAR is like a fixed-size array that is always the same size, where as VARCHAR will expand and contract to fit the data.

The CHAR type is useful if you have a good idea of the size that the field will be. A product code that is always 10 digits, for example, is a great candidate for a CHAR data type. The negative side of the CHAR type is that it will use up the disk space whether or not you need it. For example, if you have a CHAR(25) but your record only uses up 5 characters, MySQL will still use up all 25 on disk.

The VARCHAR is convenient because you can set it to 25, but if you only store 5 characters, the VARCHAR will only use the space it needs. Two tradeoffs for this convenience are performance and fragmentation. In particular, when you update a VARCHAR field, you might outgrow the space that was originally allocated. Your data becomes fragmented so that one record may be spread out across the DB file instead of in one nice, continuous chunk. So as time goes by, your access to the VARCHAR data will require more processing power.

MySQL provides a solution in the way of the OPTIMIZE TABLE statement. This basically will de-fragment your table VARCHAR fields. If you are using VARCHARS and doing a lot of updating, you should make sure to optimize the tables periodically. This is something that can be automated through cron or Scheduled Tasks so that you don’t forget.

In case you think it’s a simple choice, keep in mind that table size and performance are closely related. So, it follows that the VARCHAR would deliver performance benefits due to it being the smallest possible size. That would offset the performance benefit that the consistency of CHAR delivers. My rule of thumb is to use CHAR if you have a pretty good idea of the field size.

More reading is available on the MySQL site, however I find the user comments on the end of this page more interesting than the official documentation.

Don’t mix “SELECT * FROM” with TEXT/BLOB fields

TEXT and BLOB fields are extremently useful for storing large datatypes. File uploads, XML data, long text content, etc usually require one of these types of fields. Unfortunately, these fields can make a table grow large quickly. Storing binary file data for example can make a DB table enormous.

Depending on your application, this may just be the best way to store this type of data. However, if you don’t need this data on every query, then don’t select it. Here’s an example:

select * from file_upload where fu_owner_id = ‘1′

What we’re doing here is grabbing the full row contents for the file uploads, though we may not need all of that data. You can imagine that this query might select all of the files that have been uploaded by a specific user for us to show in a datagrid, dropdown list or whatever. The issue here is that you rarely need to have the *contents* of the files when you are displaying them in a list. If possible, this might be better:

select fu_id, fu_name from file_upload where fu_owner_id = ‘1′

In this case we have the primary key (fu_id) and the file name (fu_name). We can show this in the list Now, if the user wants to download or see the contents of the file, you can grab it using the primary key:

select fu_name, fu_data from file_upload where fu_id = ‘55′

An alternative approach is to use a dedicated table with only a primary key and the BLOB/TEXT field. Any other tables that need to store large values can simply have a reference to that table’s primary key instead. If you only do lookups from the dedicated table by it’s primary key, then you’ll avoid any full table scans.

Read The Fine Manual

The MySQL site has plenty of information describing the various data types. The little bit of time and effort that you put into tuning your database will pay off as your application grows.

More reading on MySQL 5 data types can be found here: http://dev.mysql.com/doc/refman/5.0/en/data-types.html