Choosing the best type and name!
We will be running through a few tips and tricks I’ve run into when it comes to designing a solid database. Please note this tutorial will be based off PostgreSQL however most concepts apply to any SQL database.
Let get started!
Part 1: Numerous Numerics
We need to be aware of what datatypes we are choosing and the long term effects they can have on our database. When our tables are only a few hundred rows long, it can be hard to see the impact of choosing the wrong datatype, but as the size scales up so does those effects.
PostreSQL offers several numeric datatypes including,
- SMALLINT (2-byte integer ranging from -32,768 to 32,767)
- INT (4-byte integer that range from -2,147,483,648 to 2,147,483,647)
- BIGINT (8-bytes integer from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807)
When we are choosing which integer type to use we must be aware, regardless of the value, the amount of space the integer occupies on the disk does not change. By choosing the smallest size possible we are reducing the size of our rows and by extension the database. This allows faster reads are writes. PostgreSQL’s SERIAL datatype (auto-incrementing) also has these various size option (although it is unsigned). We will talk more about SERIAL in Part 3.
Part 2: Storing our Strings
Choosing the best data type to store textual information in PostgreSQL can actually be a bit confusing. We have 3 primary options,
- Char(n) (fixed length of n, blank padded)
- Text (variable length)
- Varchar(n) (variable length, with a max length of n)
The confusing part comes when we try to understand the difference between these. varChar, without a max length specified, behaves like Text. When we do specify the max length, n, PostgreSQL will simply perform a check and issue an error if it is greater than n. More importantly, unlike most database systems, PostgreSQL does not have a performance benefit when using char; therefore we should try to use text, or varchar(n) when we want a length limit.
When we are setting limits using varChar, it is good practice to avoid using magic numbers. For example, if we are creating a name column in our cats table, but designer have told us they would like a maximum length of 50, we should leave a comment specifying such.
Part 3: Enough Enum
Using Enum is a great way to enforce certain types of data being stored. For example, let say we want to add a status column to our match table allowing values of only; “best friend”, “friend”, and “not friends”. Enum is a great way of accomplishing this! However, if we later want to allow for the possibility of “enemy” being stored as a status, it can be very difficult and requires rebuilding the entire table. Although this might be seen as a con, this difficulty will prevent developers from thoughtless adding in “bad” data. It also means we must be careful when using the enum data type, and make sure it’s being used in a future proof manner.
This is both a pro and a con. Pro in the sense it will prevent developers from thoughtlessly adding in bad data, and con if it
Enum allows us to enforce certain types of data being stored, for example, if we want to set out cats “status” to either best friends, friend, or enemies we can do this using enum. However if later we want to add an additional status “neutral” we would have to rebuild the entire table.
Part 4: Naming (this is a bad example)
Naming might sound pretty simple relative to everything else we’ve covered but I would actually say it’s one of the more difficult concepts. When we are naming our variables, functions, or in this specific case, our database entities and attributes we want to make sure we can make them as readable as possible. What do we mean by readable? A new developer should be able to look at the name without additional context and efficiently understand what is being represented.
Readability is essential; so important in fact companies will prioritize readability over optimization of code, due to the cost of developer time. For (<- a pun) example both mapping and forEach loops in Javascript are less optimized than for loops. However, we are pushed to use them because they are cleaner and easier to read by incoming developers
Part of the difficulty with naming is different places will have different conventions. First things first, if we adding new tables or fields to an existing database, understand their convention, and follow it!
Some common conventions to look out for:
-
Table Names
- Lower-case?
- Pluralized?
-
How are fields named?
- Lower-case?
- Have they been used elsewhere?
For example, if we see someone has used imageurl in one place to store the picture path of an animal, don’t suddenly introduce image, url, imagepath, pic, etc to store the image path of a new table - When we’re creating a new name could someone seeing it for the first time understand it?
_For example, if we are storing the URL to a cat’s image do we call it image, catimage, or imageurl? image_url is the closest to the data being stored, so let’s go with that!4.
-
How are foreign keys named?
- Is the table name it is referencing used?
i.e. the foreign key name is catid if it is referencing the table cats_
- Is the table name it is referencing used?
If we are unsure about a name, be that if it’s following convention, or if it’s clear, I find it extremely helpful to leave a reminder to come back to it later. When we look at poor names enough, they begin to sound good to us and are easy to miss when we review our own code. If we are unsure if a name is clear ask someone, “what do you think x is” (they don’t have to be a dev)!