# Normalization in Relational Databases: First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF)

*What is database normalization? What are the different normal forms, and what do they do? Find out in this article.*

Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place.

The theory of normal forms gives rigorous meaning to these informal concepts. There are many **normal forms**. In this article, we’ll review the most basic:

- First normal form (1NF)
- Second normal form (2NF)
- Third normal form (3NF)

There are normal forms higher than 3NF, but in practice you usually normalize your database to the third normal form or to the Boyce-Codd normal form, which we won’t cover here.

So, what is this theory of normal forms? It deals with the mathematical construct of **relations** (which are a little bit different from relational database tables). The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.

Don’t worry if this sounds complicated; I promise it will get clearer as we go through each step. Let’s start with 1NF – the first step.

## First Normal Form (1NF)

A relation is in **first normal form** (1NF) if (and only if):

- Each attribute contains only one value.
- All attribute values are atomic, which means they can’t be broken down into anything smaller.

In practice, 1NF means that you should not have lists or other composite structures as attribute values. Below is an example of a relation that does not satisfy 1NF criteria:

**Student courses**

student | courses |
---|---|

Jane Smith | Databases, Mathematics |

John Lipinsky | English Literature, Databases |

Dave Beyer | English Literature, Mathematics |

This relation is not in 1NF because the *courses* attribute has multiple values. Jane Smith is assigned to two courses (Databases and Mathematics), and they are stored in one field as a comma-separated list. This list can be broken down into smaller elements (i.e. course subjects: databases as one element, mathematics as another), so it’s not an atomic value.

To transform this relation to the first normal form, we should store each course subject as a single value, so that each student-course assignment is a separate tuple:

**Student courses**

student | course |
---|---|

Jane Smith | Databases |

Jane Smith | Mathematics |

John Lipinsky | English Literature |

John Lipinsky | Databases |

Dave Beyer | English Literature |

Dave Beyer | Mathematics |

If you’re interested in reading more about the first normal form, I recommend the article What Is the Actual Definition of First Normal Form? by my colleague Konrad Zdanowski.

## Second Normal Form (2NF)

A relation is in **second normal form** (2NF) if and only if:

- It is in 1NF.
- No non-prime attributes are functionally dependent on a subset of the candidate key(s). In other words,
**any column that’s not a key column is dependent on the whole information in the candidate key**.

What does this mean? If the value of attribute A is determined by the value of attribute S, then *A* is functionally dependent on S. For example, your age is functionally dependent on your date of birth. For more on functional dependencies, see this article.

Let’s go back to the idea of candidate keys and non-prime attributes. What are they?

- A
**candidate key**is a minimal set of attributes that determines the other attributes included in the relation. It’s minimal in that if you removed one attribute, the remaining attributes do not form a candidate key. For a more detailed exploration of keys, see this article. - A
**non-prime attribute**is an attribute that is not part of the candidate key. However, for a relation to be 2NF, the information stored by non-prime attributes must be related to the whole information in the candidate key.

Informally, the second normal form states that **all attributes must depend on the entire candidate key**.

Let’s see an example of a relation that does not satisfy 2NF. The underlined attributes are the candidate key.

**Bike parts warehouse**

part | supplier | quantity | supplier country |
---|---|---|---|

Saddle | Bikeraft | 10 | USA |

Brake lever | Tripebike | 5 | Italy |

Top tube | UpBike | 3 | Canada |

Saddle | Tripebike | 8 | Italy |

- The
**candidate key**is the*part*and*supplier*set, which is expressed like this*{part, supplier}*. - The
**non-prime attributes**(which are not part of the candidate key) are*quantity*and*supplier country*. - There are
**functional dependencies**between*part, supplier,*and*quantity*(expressed as*part, supplier → quantity )*and between*supplier*and*supplier country (*expressed as*supplier → supplier country)*.

Why doesn’t this satisfy 2NF? The set *{part, supplier} *is the only candidate key of this relation. The value of *supplier country* is functionally dependent on *supplier*. *Supplier country* is not part of the candidate key, so it is a non-prime attribute and it is functionally dependent on** part** of the candidate key, not the entire candidate key *{part, supplier}*.

To transform this relation into 2NF, we need to split it into two relations: **Bike parts** (with the attributes *part*, *supplier*, and *quantity*) and **Suppliers** (with the attributes *supplier* and *supplier country*). This would look like as follows:

**Bike parts**

part | supplier | quantity |
---|---|---|

Saddle | Bikeraft | 10 |

Brake lever | Tripebike | 5 |

Top tube | UpBike | 3 |

Saddle | Tripebike | 8 |

The relation **Bike parts** is in 2NF because, as before, the *quantity *attribute depends on the pair *supplier* and *part*.

**Suppliers**

supplier | supplier country |
---|---|

Bikeraft | USA |

Tripebike | Italy |

UpBike | Canada |

The **Suppliers** relation is in 2NF because *supplier country* is functionally dependent on* supplier*, which is the candidate key of this relation.

Let’s see one more example of a non-2NF relation.

**Student course fees**

student | course | grade | course fee |
---|---|---|---|

Alison Brown | Databases | A | $100 |

Jason Liu | Mathematics | B | $150 |

Mariah Hill | Databases | B+ | $100 |

**Candidate key:***{student, course}***Non-prime attributes:***grade, course fee***Functional dependencies:***s**tudent, course → grade; course → course fee*

The following relation does not satisfy 2NF. The set *{student, course}* is the relation’s candidate key, but the value of *course fee* is functionally dependent on *course* alone. *Course fee* is a non-prime attribute, which is functionally dependent on only **part** of the candidate key.

To transform this into 2NF, we again split it into two relations: **Student courses **(with the attributes *student, course*, and *grade*) and **Courses** (with the attributes *course *and *course fee*). Thus, we avoid the partial dependency in the non-2NF relation above.

**Student course**

student | course | grade |
---|---|---|

Alison Brown | Databases | A |

Jason Liu | Mathematics | B |

Mariah Hill | Databases | B+ |

**Courses**

course | course fee |
---|---|

Databases | $100 |

Mathematics | $150 |

Why not try verifying for yourself that these relations are indeed 2NF?

Note that the 2NF partial dependency rule only kicks in if your relation has a composite candidate key (i.e. one that consists of multiple attributes). **All relations that have a single-attribute key are by definition in 2NF.**

## Third Normal Form (3NF)

A relation is in **third normal form** (3NF) if and only if:

- It is in second normal form (2NF).
- All non-prime attributes are directly (non-transitively) dependent on the entire candidate key.

In other words, non-prime attributes must be functionally dependent on the key(s), but they must not depend on another non-prime attribute. **3NF non-prime attributes depend on “nothing but the key”.**

Let’s see a non-3NF relation:

**Order information**

order_id | date | customer | customer email |
---|---|---|---|

1/2020 | 2020-01-15 | Jason White | white@example.com |

2/2020 | 2020-01-16 | Mary Smith | msmith@mailinator.com |

3/3030 | 2020-01-17 | Jacob Albertson | jasobal@example.com |

4/2020 | 2020-01-18 | Bob Dickinson | bob@fakemail.com |

**Candidate key:***order_id***Non-prime attributes:***date, customer, customer email***Functional dependencies:***date*depends on*order_id (order_id → date)*;*customer*depends on*order_id (order_id → customer)*, and*customer email*depends on*customer (customer → customer email)*.

This relation does not satisfy 3NF. The only candidate key in this relation is *order_id*. The value of *customer email* is functionally dependent on the *customer* attribute, which is a non-prime attribute. Thus, the relation violates 3NF.

Once again, we split this into two relations: **Orders** (with the attributes *order_id*, *date*, and *customer*) and **Customers** (with the attributes *customer* and *customer email*):

**Orders**

order_id | date | customer |
---|---|---|

1/2020 | 2020-01-15 | Jason White |

2/2020 | 2020-01-16 | Mary Smith |

3/3030 | 2020-01-17 | Jacob Albertson |

4/2020 | 2020-01-18 | Bob Dickinson |

**Customers**

customer | customer email |
---|---|

Jason White | white@example.com |

Mary Smith | msmith@mailinator.com |

Jacob Albertson | jasobal@example.com |

Bob Dickinson | bob@fakemail.com |

**Orders** is in 3NF because the *date* and *customer* attributes do not violate the rule of 3NF; their values depend on the *order_id* number. **Customers** is in 3NF because *customer email* is functionally dependent on *customer*, which is the candidate key of this relation. In both cases, **all non-prime attributes depend on the candidate key**.

Let’s see one more non-3NF example.

**Courses**

course | year | teacher | teacher date of birth |
---|---|---|---|

Databases | 2019 | Chris Cape | 1974-10-12 |

Mathematics | 2019 | Daniel Parr | 1985-05-17 |

Databases | 2020 | Jennifer Clock | 1990-06-09 |

**Candidate key:***{course, year}***Non-prime attributes:***teacher, teacher date of birth***Functional dependencies:***teacher*depends on*course*and*year (course, year → teacher)*;*teacher date of birth*depends on*teacher (teacher → teacher date of birth)*

This relation does not satisfy 3NF. The only candidate key in this relation is *{course, year}*, but the value of *teacher date of birth* is functionally dependent on *teacher* – a non-prime attribute. This violates 3NF.

Guess how we’ll transform this into 3NF? That’s right; we split the relation. **Courses** gets the attributes *course, year,* and *teacher*; **Teachers** gets the attributes *teacher* and* teacher date of birth*:

**Courses**

course | year | teacher |
---|---|---|

Databases | 2019 | Chris Cape |

Mathematics | 2019 | Daniel Parr |

Databases | 2020 | Jennifer Clock |

**Teachers**

teacher | teacher date of birth |
---|---|

Chris Cape | 1974-10-12 |

Daniel Parr | 1985-05-17 |

Jennifer Clock | 1990-06-09 |

Try verifying that these relations are indeed in 3NF for yourself. How would you explain the changes made?

## Database Normalization: Summary

First, second, and third normal forms are the basic normal forms in database normalization:

- The first normal form (1NF) states that each attribute in the relation is atomic.
- The second normal form (2NF) states that non-prime attributes must be functionally dependent on the entire candidate key.
- The third normal form (3NF) states that non-prime attributes must be directly (non-transitively) dependent on candidate keys.

Stay tuned to our blog for more articles on database normalization!