Database Normalization in Simple English
Have you heard of 2NF, 3NF, and BCNF? Do you know what the database normal forms mean? In this article, you will learn database normalization in simple English. With real-world examples, you will better understand what normal forms are, why there are different types, and why we need them.
Database courses typically teach database normalization and normal forms abstractly. Functional dependencies and normalization problems are defined using letters (e.g., Suppose that R(A,B,C,D) and A,B → C and C → D. Is R in 2NF?).
Without meaningful, concrete examples, students often have a hard time understanding what normal forms are, why we need them, and why there are so many. Students also often struggle with how to apply this theoretical knowledge to real-world situations.
In this article, I’ll show real-world examples of tables in different normal forms. Hopefully, these examples will help you better understand the different normal forms. I’ll also suggest tricks for remembering what each normal form is.
Why Do We Need Normal Forms?
Some database schemas are convoluted, complex, and unnecessarily intertwined. Just by looking at it, you intuitively feel if a design is not “right.” It might be messy or combine multiple different concepts in one table.
The theory of database normal forms is to precisely define this intuition. Simply, normal forms define which database schemas make sense.
3NF Is the Go-To Normal Form in Practice
In practice, you should focus on the third normal form (3NF). When you create a database, verify that your schema is in 3NF.
The normal forms below 3NF—the first normal form (1NF) and the second normal form (2NF)—are intermediate steps toward 3NF. The higher normal forms (4NF and 5NF) are less frequently used.
The Boyce-Codd normal form (BCNF) is also sometimes of interest. Boyce and Codd defined BNCF to replace 3NF because 3NF does not satisfactorily treat the general case. However, by this time, 3NF had already been adopted and renaming BCNF to 3NF would have caused confusion. (BCNF is sometimes called 3.5 NF.)
In practice, people tend to use 3NF rather than BCNF, especially since 3NF and BCNF are equivalent in most cases.
The Intuition Behind 3NF
Very informally, 3NF can be explained like this:
A table is in 3NF if and only if each record consists of a primary key that identifies some entity and a set of mutually independent attributes that describe that entity in some way.
This requirement makes sense, right? It says that each table contains information about a single entity and nothing else. The table should not mix up information about different entities.
Let’s look at some examples of tables that are not in 3NF. These tables feel “wrong” even for someone who’s not familiar with normal forms because they mix up different concepts in one table.
Example 1: Naive Products
This table contains information about products, their suppliers, and their prices. We distinguish between the same products delivered by different suppliers. The same product delivered by different suppliers can have different prices.
What is the entity that this table describes? A product, delivered by a specific supplier. The product is identified by the product name and the supplier name. The price describes the product. All of these columns feel “right.”
What about the country of the supplier? There is something wrong with this column. It doesn’t describe the product. Instead, it describes the supplier.
We can see that the table Naive Products mixes up two different concepts: products and suppliers. To get the “right” design, we need to split the table into two: one with product information and the other with supplier information.
This new design feels “right.” Each table only contains attributes of one entity: either the products or the suppliers. Distinct concepts are kept in separate tables.
Example 2: Naive Course Editions
Let’s look at another example. In the table Naive Course Editions, we have information about different editions of courses. Each year, the courses might be taught by someone else.
Do you think that this design is “right”? What is the entity that this table describes?
Naive Course Editions
|course||year||teacher||teacher date of birth|
The main entity of this table is course editions in different years. Are all columns relevant to this? Not really.
A teacher's date of birth is an attribute of the teacher and not a course edition. This table mixes up course editions and teacher information. It should split into two tables: course editions and teachers.
|teacher||teacher date of birth|
These two tables feel “right.” Each table only deals with one entity: either the course editions or the teachers.
How To Remember the Rules for Normal Forms
Students often have trouble remembering the rules for the different normal forms. I have two tricks to share that help me remember and describe the normal forms.
The first trick is a mnemonic coined by William Kent:
"The key, the whole key, and nothing but the key, so help me Codd."
The second trick is a functional dependency diagram, which illustrates the functional dependencies. In the diagram, include all of the table columns and draw arrows representing the functional dependencies between the columns.
An arrow from column A to column B signifies that column B is functionally dependent on column A. An arrow from a set of columns C, represented by a box, to column D signifies that D is functionally dependent on the set C.
Here is a functional dependency diagram for the Naive Products table.
Let’s see how these two tricks can be applied to remember the normal forms.
1NF = The Key
I admit that the application of the mnemonic to the first normal form is a bit far-fetched. 1NF was defined for relations, not tables. The 1NF interpretation for tables says that the values in each column should not be tables and that the records in the table should be unique.
The uniqueness of records in the table is implemented by having a primary key in the table. Thus, the mnemonic the key means that a table in 1NF should have a (primary) key.
2NF = The Whole Key
The second normal form says that a table is in 1NF and that there are no partial dependencies on the candidate keys. All columns must be dependent on the whole key.
Let’s look at the functional dependency diagram for the Naive Products table.
The arrow from supplier to supplier country violates the 2NF rule. Rather than starting in the whole key (product, supplier), the arrow starts in part of the key, the supplier column. This arrow violates the 2NF rule.
The 2NF rule in terms of functional dependency diagrams states that there are no arrows that start inside a candidate key. All arrows that start in the candidate key must take the whole key into account.
3NF = Nothing but the Key (Attribute)
The third normal form says that the table is in 2NF and that every non-key attribute is non-transitively (directly) dependent on the candidate key. The columns can only be dependent on the key columns.
In terms of functional dependency diagrams, it means that there are no arrows between non-key columns. Let’s look at the functional dependency diagram for the table Naive Course Editions.
The arrow between teacher and teacher date of birth violates the 3NF rule. Both the teacher and teacher’s date of birth are non-key attributes, so the dependency between them is not allowed by 3NF. In 3NF, all non-key attributes should depend on nothing but key attributes.
BNCF = 3.5NF = Nothing but the Key
The “nothing but the key” mnemonic more accurately describes BCNF, the stronger version of 3NF. The BCNF rule says that all non-trivial functional dependencies must start with a superkey. In terms of functional dependency diagrams, all arrows must be out of candidate keys. Attributes depend on nothing but the key.
What’s the difference between 3NF and BCNF? It can only be seen if the table has overlapping composite (i.e., multicolumn) candidate keys.
This is quite rare. In most practical situations, it does not happen. If there are no overlapping composite keys, the 3NF and BCNF are equivalent.
Let’s look at an example of a table that is in 3NF but not in BCNF: the Favorite Food table. This table contains information about each person’s favorite food type.
The person and food type columns determine each person’s favorite food of a certain type. But favorite food alone determines its food type. This table has two candidate keys: (person, food type) and (person, favorite food).
|person||food type||favorite food|
Here is the functional dependency diagram for this table.
The arrow from favorite food to food type does not violate the 3NF rule: favorite food is part of a candidate key (person, favorite food), so it’s a key attribute. However, the arrow violates the BCNF rule. The arrow doesn’t come out of a candidate key but from a part of it.
In BCNF, all arrows must come out of candidate keys. Attributes depend on nothing but the key.
Normal Forms: A Recap
Let’s recap the mnemonic again.
The key, the whole key, and nothing but the key, so help me Codd.
- The key stands for 1NF: all tables must have a (primary) key because all records in the table must be unique.
- The whole key stands for 2NF: no functional dependencies on part of the key are allowed.
- Nothing but the key (attribute) stands for 3NF: no functional dependencies between non-key attributes are allowed.
- Nothing but the key stands for BCNF: only functional dependencies out of candidate keys are allowed.
If you liked this article, check out other articles about database normalization on our blog.
Vertabelo allows you to draw logical and physical ER diagrams directly in your browser. It supports PostgreSQL, SQL Server, Oracle, MySQL, Google BigQuery, Amazon Redshift, and other relational databases.
 Paraphrased from “An Introduction to Database Systems” by C.J. Date