Using Binary UUIDs as Primary Keys

Mon, 23 Sep 2019

Primary keys are important

That’s why they’re called primary keys! What we use for our primary key depends heavily on what we are using it for. In this article, we will focus on comparing different alternative primary key types to decide, under what conditions, we should use them. For those that don’t care about the why (T_T’) please skip to the end.

Otherwise.. Let get started!

Part 1: Categories of Primary Keys

There are 3 main categories of primary keys: artificial, synthetic, or composite; Natural: uses a unique field already available in the table Synthetic: creates a new field where a unique generated value will be stored Composite: uses multiple fields to represent a primary key (where the combination of the two is unique)

How do we know when to use what?

I rarely use synthetic keys. This is because natural keys are often not unique or persistent, even when we think they should be. Take a social security number. Aside from the fact it is illegal to use this for non-tax related purposes, not everyone has one, and some people (maliciously or otherwise) can give us the wrong one.

The only exception I make to this rule is when we are using the unique id generated in an external source as a reference to that external entity. For example, let’s say I’m creating a micro-service for my school. In our application, we would like to keep track of which books belong to which students, but we do not want to store information about students. Instead, we want to pull this information for the school’s API. When we create our mapping of books to students we can use the composite key produced by combining the natural primary key of the student, and the synthetic primary key we generate for the books.

Look at that, we also introduced a common scenario when to use a composite key there! A mapping between 2 tables with a many-to-many relationship, aka an associative entity table.

Another scenario where composite keys come in handy is weak entity tables or when we have multivalued attributes. For example, let’s say we allow our users to have multiple phone numbers but we don’t know how many. We can create a weak entity table to store them, where the phone number and the user id are used to generate a composite primary key.

Finally, synthetic keys are my catch-all. In general, I will always create a unique id for my tables save for the cases mentioned above. When we are creating synthetic keys, however, we have lots of different options as to what data type we are using. In the next section, we are going to take a look at some of these options.

Part 2: Types of Synthetic Primary Keys

When generating artificial keys we are going to consider 4 possible alternatives: serial, bigint, serial, uuid, and binary uuid. Below we will summarize the difference between these.

Type Description Cons Pros
serial - auto increment - not secure (guessable) - not distributable (central) - Smaller - Less Complex (Developers shifting from traditional SQL can find this difficult)
bigint -random unique 64-bit unsigned integers, which are based on time - To allow javascript and JSON parsers to read bigint IDs we need to include a string version of any ID when responding in the JSON format - does not guarantee uniqueness - not a popular implementation - Uncoordinated (no central coordination point) - Compact (generated ids are 64-bit long.) - (Roughly) Time-ordered (the ids will be k-ordered) - Good performance (generate 4096 ids per millisecond)
uuid - universal unique identifier (The idea being that that id number will be universally unique) note * UUID is the same as GUID (Microsoft) - 128 bits in length - added “complexity” to developers - In a distributed database this prevents pk collision - Not limited by bigint’s size (infinite) - Better security - Establishes better baseline (how it scales) - You can generate IDs anywhere, instead of having to roundtrip to the database
binary uuid - storing uuid as a binary instead of the traditional char - Must convert to and from binary when moving between application and database (though this should not be necessary on a frequent basis) - MySQL8 has not in-built the functionality to convert a uuid to binary - MySQL is faster using binary than anything else when indexes are required. - maintains pros of uuid - half the size of uuid

When generating the uuid we have several options available:

Version 1 UUIDs, the most common, combines a MAC address (which is now changable) and a timestamp to produce sufficient uniqueness. In the event of multiple UUIDs being generated fast enough that the timestamp doesn’t increment before the next generation, the timestamp is manually incremented by 1. If no MAC address is available, or if its presence would be undesirable for privacy reasons, 6 random bytes sourced from a cryptographically secure random number generator may be used for the node ID instead.

Version 3 and Version 5 UUIDs, the least common, use the MD5 and SHA1 hash functions respectively, plus a namespace, plus an already unique data value to produce a unique ID. This can be used to generate a UUID from a URL for example.

Version 4 UUIDs, are simply 128 bits of random data, with some bit-twiddling to identify the UUID version and variant.

To guarantee the effectiveness of the uuid I generally use version 4.

Conclusion

Unfortunatly like most decision we encounter the answer depends on the situation. Understanding the constraints surrounding our application is essential to selecting the correct alternative. Understanding what those constraints might look like in the future is equally important! If we predict our database might require distribution in the future, we should architect it to handle this now. However, we must also balance the amount of time we are deligating to accounting for all possible changes; if the refractor required is minor, it can be handled later if it’s needed.

Loading...
Stephanie Mills

Stephanie Mills

Living to learn. Designing to disrupt. Coding to create

  • Learning. Laughing. Coding. Creating.