March 1, 2009 8:40 p.m.
There's a bit of debate over whether UUIDs(or GUIDs - for Microsoft software) are suitable to be used as primary keys for objects in a database. I'm not so much going to describe the debate, as there's no point in arguing, it comes down to a design decision. You can think of the pros and cons, and decide for yourself.
One of the big concerns of using autoincremented fields from a database, is that if a lot of inserts are happening, especially distributed inserts, everyone is waiting on the database to generate a new id. Some large distributed architectures actually have dedicated servers whose only purpose is to give out ids to objects.
The advantage of using autoincremented values is there's a guarantee of uniqueness. However, that guarantee comes at a high performance bottleneck, due to the mutual exclusive database access needed to generate the next id. This bottleneck cannot be resolved by parallelism, due to race conditions, there still is only one mutex lock, so everyone still waits in one line for an id.
The UUID solution is instead to generate a unique id that is not guaranteed to be unique, but has a very high probability of being unique.
".. after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%."
I'm pretty comfortable with those odds, I don't see myself generating more than a billion UUIDs anytime soon.
I just generated a UUID: 1e8ef774-581c-102c-bcfe-f1ab81872213
A UUID like the one above is 36 characters long, including dashes. If you store this VARCHAR(36), you're going to decrease compare performance dramatically. This is your primary key, you don't want it to be slow.
At its bit level, a UUID is 128 bits, which means it will fit into 16 bytes, note this is not very human readable, but it will keep storage low, and is only 4 times larger than a 32-bit int, or 2 times larger than a 64-bit int. I will use a VARBINARY(16) Theoretically, this can work without a lot of overhead.
I'm not really satisfied with the UUID generator for MySQL, they use a very old standard based on timestamps and MAC addresses(Version 1). This potentially can be a problem because a UUID can be traced to a particular machine, and a particular time.
I'd rather have more random bits than include a machine signature, who really needs to track down their UUIDs to which machine generated them? I just want them to be unique, and don't need unneccessary bread crumbs lying around. Most likely I'd use Version 3,4, or 5.
I'll be using my own UUID generator, which is a fantastic side-effect of UUIDs, you can generate the numbers however you want. You should be following the spec, or your results are not guaranteed.
create table uuid_demo ( id VARBINARY(16), name VARCHAR(10) );
I created a simple table to illustrate usage.
I'm using the mysql console to demonstrate. Inserting the UUID goes like so:
mysql> INSERT INTO uuid_demo SET id=0x1e8ef774581c102cbcfef1ab81872213, name="Kekoa"; Query OK, 1 row affected (0.00 sec)
Notice that the id is not entered as a string, but as a large hex number. It is the same UUID as generated above, but without the dashes and with a 0x to indicate hex.
Now to display what's in the table:
mysql> SELECT * FROM uuid_demo; +------------------+-------+ | id | name | +------------------+-------+ | #####,##??" | Kekoa | +------------------+-------+ 1 row in set (0.00 sec)
Well that's not very helpful, how do we retrieve the data? There's a couple of ways, your SQL driver will correctly receive the binary data into your field, then you'll have to parse it, however, you can use the HEX() function in mysql to quickly do the conversion for you:
mysql> SELECT hex(id), name FROM uuid_demo; +----------------------------------+-------+ | hex(id) | name | +----------------------------------+-------+ | 1E8EF774581C102CBCFEF1AB81872213 | Kekoa | +----------------------------------+-------+ 1 row in set (0.00 sec)
Ahh, much better. You can treat the result as a string and add the dashes or whatever you need if you want it to be pretty. If your language supports binary data well, you may consider retaining the binary data as is.
So wherever in your queries you need to reference the id, do so using the hex value of the UUID, and when retrieving the rows make sure you remember that the id is binary, and handle it appropriately.