How and why: Normalize your Data

When we talk about normalizing data within a database we really just mean that we’re organizing our data properly in order to make our database run more efficiently and reduce needless duplication. We can put our data through 5 different levels of organization, which we call “forms”.

Basic rules for normalization:

  • Each table should have a unique row identifier, i.e., a column or set of columns which can distinguish any record from any other record.
  • Each table should only store data for a single type of entity. (e.g., separate tables for sales, products, customers rather than one hybrid Frankentable)
  • A table should avoid nullable columns
  • A table should not have repeating fields or columns.

Consider the following table:

NmzPet1

This is a fairly straight forward table in the first normal form (1NF) showing clients of a veterinary practice. You can see that “John Smith” has two dogs: a Spaniel named Patch and a Dalmatian named Spot, and both John and “Sandra Evans” have Spaniels. Now imagine that John Smith changes his phone number or that you accidentally mistyped Spaniel. You now have to update your database in two places.

To get this table to the second normal form (2NF) you have to eliminate that redundancy. Look at the type of data you require and sort that into tables to ensure that you have as little duplication as possible. In this instance we can create several new tables.

NmzPet2

Get all the owner information out and into its own table. In a real life scenario this would probably also contain stuff like owner’s address but for this example we’re keeping it simple. Now every time John Smith updates his contact number we only have to change it in this one place.

NmzPet3

What else is duplicated? Pet species and breed information. The other thing we need for our database to conform to 2NF is to ensure each table has a primary key specified.

Edit the initial table to reflect these changes and what have we got?

NmzPet6

A lot less duplication but it’s still not very efficient for searching. Let’s go deeper.

To conform to second normal form (3NF) we have to get rid of partial dependencies of primary keys and columns.Let’s make another table:

NmzPet7

As you can see we’ve now completely separated the pets from their main table, and we’ve already separated the owners out.

NmzPet8
Our main table now looks like this:

NmzPet9

The primary key for this table will consist of both the owner_id and the pet_id. You should also place foreign keys to link the main table’s owner_id to the owner table’s id and the same for the main.pet_id and pet_details.id.

For every day purposes 3NF is what most most database designers aim for. While you can go up to the fourth normal form (4NF) or even fifth normal form (5NF) it starts to get a lot harder to keep track of how your tables interact with each other. Generally speaking a higher level of normalization requires the removal of multi-valued dependencies and it’s always worth thinking about how much you can separate out your information before it becomes too difficult to maintain and any efficiency boost is lost.

How far are your databases normalized? Have you ever had any issues with not normalizing far enough?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s