December 1, 2020 - 8 minutes read

Normalization in Relational Databases: First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF)

Agnieszka is a Chief Content Officer at Vertabelo. Before coming to Vertabelo, she worked as a Java programmer. She has a PhD in mathematics and over 10 years of experience in teaching mathematics and computer science at the University of Warsaw. In her free time, she enjoys reading a good book, going mountain hiking and practicing yoga.

Tags:

What is database normalization? What are the different normal forms, and what do they do? Find out in this article.

Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place.

The theory of normal forms gives rigorous meaning to these informal concepts. There are many normal forms. In this article, we’ll review the most basic:

First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)

There are normal forms higher than 3NF, but in practice you usually normalize your database to the third normal form or to the Boyce-Codd normal form, which we won’t cover here.

So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.

Don’t worry if this sounds complicated; I promise it will get clearer as we go through each step. Let’s start with 1NF – the first step.

First Normal Form (1NF)

A relation is in first normal form (1NF) if (and only if):

Each attribute contains only one value.
All attribute values are atomic, which means they can’t be broken down into anything smaller.

In practice, 1NF means that you should not have lists or other composite structures as attribute values. Below is an example of a relation that does not satisfy 1NF criteria:

Student courses

student	courses
Jane Smith	Databases, Mathematics
John Lipinsky	English Literature, Databases
Dave Beyer	English Literature, Mathematics

This relation is not in 1NF because the courses attribute has multiple values. Jane Smith is assigned to two courses (Databases and Mathematics), and they are stored in one field as a comma-separated list. This list can be broken down into smaller elements (i.e. course subjects: databases as one element, mathematics as another), so it’s not an atomic value.

To transform this relation to the first normal form, we should store each course subject as a single value, so that each student-course assignment is a separate tuple:

Student courses

student	course
Jane Smith	Databases
Jane Smith	Mathematics
John Lipinsky	English Literature
John Lipinsky	Databases
Dave Beyer	English Literature
Dave Beyer	Mathematics

If you’re interested in reading more about the first normal form, I recommend the article What Is the Actual Definition of First Normal Form? by my colleague Konrad Zdanowski.

Second Normal Form (2NF)

A relation is in second normal form (2NF) if and only if:

It is in 1NF.
No non-prime attributes are functionally dependent on a subset of the candidate key(s). In other words, any column that’s not a key column is dependent on the whole information in the candidate key.

What does this mean? If the value of attribute A is determined by the value of attribute S, then A is functionally dependent on S. For example, your age is functionally dependent on your date of birth. For more on functional dependencies, see this article.

Let’s go back to the idea of candidate keys and non-prime attributes. What are they?

A candidate key is a minimal set of attributes that determines the other attributes included in the relation. It’s minimal in that if you removed one attribute, the remaining attributes do not form a candidate key. For a more detailed exploration of keys, see this article.
A non-prime attribute is an attribute that is not part of the candidate key. However, for a relation to be 2NF, the information stored by non-prime attributes must be related to the whole information in the candidate key.

Informally, the second normal form states that all attributes must depend on the entire candidate key.

Let’s see an example of a relation that does not satisfy 2NF. The underlined attributes are the candidate key.

Bike parts warehouse

part	supplier	quantity	supplier country
Saddle	Bikeraft	10	USA
Brake lever	Tripebike	5	Italy
Top tube	UpBike	3	Canada
Saddle	Tripebike	8	Italy

The candidate key is the part and supplier set, which is expressed like this {part, supplier}.
The non-prime attributes (which are not part of the candidate key) are quantity and supplier country.
There are functional dependencies between part, supplier, and quantity (expressed as part, supplier → quantity ) and between supplier and supplier country (expressed as supplier → supplier country).

Why doesn’t this satisfy 2NF? The set {part, supplier} is the only candidate key of this relation. The value of supplier country is functionally dependent on supplier. Supplier country is not part of the candidate key, so it is a non-prime attribute and it is functionally dependent on part of the candidate key, not the entire candidate key {part, supplier}.

To transform this relation into 2NF, we need to split it into two relations: Bike parts (with the attributes part, supplier, and quantity) and Suppliers (with the attributes supplier and supplier country). This would look like as follows:

Bike parts

part	supplier	quantity
Saddle	Bikeraft	10
Brake lever	Tripebike	5
Top tube	UpBike	3
Saddle	Tripebike	8

The relation Bike parts is in 2NF because, as before, the quantity attribute depends on the pair supplier and part.

Suppliers

supplier	supplier country
Bikeraft	USA
Tripebike	Italy
UpBike	Canada

The Suppliers relation is in 2NF because supplier country is functionally dependent on supplier, which is the candidate key of this relation.

Let’s see one more example of a non-2NF relation.

Student course fees

student	course	grade	course fee
Alison Brown	Databases	A	$100
Jason Liu	Mathematics	B	$150
Mariah Hill	Databases	B+	$100

Candidate key: {student, course}
Non-prime attributes: grade, course fee
Functional dependencies: student, course → grade; course → course fee

The following relation does not satisfy 2NF. The set {student, course} is the relation’s candidate key, but the value of course fee is functionally dependent on course alone. Course fee is a non-prime attribute, which is functionally dependent on only part of the candidate key.

To transform this into 2NF, we again split it into two relations: Student courses (with the attributes student, course, and grade) and Courses (with the attributes course and course fee). Thus, we avoid the partial dependency in the non-2NF relation above.

Student course

student	course	grade
Alison Brown	Databases	A
Jason Liu	Mathematics	B
Mariah Hill	Databases	B+

Courses

course	course fee
Databases	$100
Mathematics	$150

Why not try verifying for yourself that these relations are indeed 2NF?

Note that the 2NF partial dependency rule only kicks in if your relation has a composite candidate key (i.e. one that consists of multiple attributes). All relations that have a single-attribute key are by definition in 2NF.

Third Normal Form (3NF)

A relation is in third normal form (3NF) if and only if:

It is in second normal form (2NF).
All non-prime attributes are directly (non-transitively) dependent on the entire candidate key.

In other words, non-prime attributes must be functionally dependent on the key(s), but they must not depend on another non-prime attribute. 3NF non-prime attributes depend on “nothing but the key”.

Let’s see a non-3NF relation:

Order information

order_id	date	customer	customer email
1/2020	2020-01-15	Jason White	white@example.com
2/2020	2020-01-16	Mary Smith	msmith@mailinator.com
3/3030	2020-01-17	Jacob Albertson	jasobal@example.com
4/2020	2020-01-18	Bob Dickinson	bob@fakemail.com

Candidate key: order_id
Non-prime attributes: date, customer, customer email
Functional dependencies: date depends on order_id (order_id → date); customer depends on order_id (order_id → customer), and customer email depends on customer (customer → customer email).

This relation does not satisfy 3NF. The only candidate key in this relation is order_id. The value of customer email is functionally dependent on the customer attribute, which is a non-prime attribute. Thus, the relation violates 3NF.

Once again, we split this into two relations: Orders (with the attributes order_id, date, and customer) and Customers (with the attributes customer and customer email):

Orders

order_id	date	customer
1/2020	2020-01-15	Jason White
2/2020	2020-01-16	Mary Smith
3/3030	2020-01-17	Jacob Albertson
4/2020	2020-01-18	Bob Dickinson

Customers

customer	customer email
Jason White	white@example.com
Mary Smith	msmith@mailinator.com
Jacob Albertson	jasobal@example.com
Bob Dickinson	bob@fakemail.com

Orders is in 3NF because the date and customer attributes do not violate the rule of 3NF; their values depend on the order_id number. Customers is in 3NF because customer email is functionally dependent on customer, which is the candidate key of this relation. In both cases, all non-prime attributes depend on the candidate key.

Let’s see one more non-3NF example.

Courses

course	year	teacher	teacher date of birth
Databases	2019	Chris Cape	1974-10-12
Mathematics	2019	Daniel Parr	1985-05-17
Databases	2020	Jennifer Clock	1990-06-09

Candidate key: {course, year}
Non-prime attributes: teacher, teacher date of birth
Functional dependencies: teacher depends on course and year (course, year → teacher); teacher date of birth depends on teacher (teacher → teacher date of birth)

This relation does not satisfy 3NF. The only candidate key in this relation is {course, year}, but the value of teacher date of birth is functionally dependent on teacher – a non-prime attribute. This violates 3NF.

Guess how we’ll transform this into 3NF? That’s right; we split the relation. Courses gets the attributes course, year, and teacher; Teachers gets the attributes teacher and teacher date of birth:

Courses

course	year	teacher
Databases	2019	Chris Cape
Mathematics	2019	Daniel Parr
Databases	2020	Jennifer Clock

Teachers

teacher	teacher date of birth
Chris Cape	1974-10-12
Daniel Parr	1985-05-17
Jennifer Clock	1990-06-09

Try verifying that these relations are indeed in 3NF for yourself. How would you explain the changes made?

Database Normalization: Summary

First, second, and third normal forms are the basic normal forms in database normalization:

The first normal form (1NF) states that each attribute in the relation is atomic.
The second normal form (2NF) states that non-prime attributes must be functionally dependent on the entire candidate key.
The third normal form (3NF) states that non-prime attributes must be directly (non-transitively) dependent on candidate keys.

Stay tuned to our blog for more articles on database normalization!

Tags:

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Database Normalization: Summary

You may also like

The Boyce-Codd Normal Form (BCNF)

Update Anomalies

What Is the Actual Definition of First Normal Form (1NF)?

Denormalization: When, Why, and How

On Keys