# A Unified View on Database Normal Forms: From the Boyce-Codd Normal Form to the Second Normal Form (2NF, 3NF, BCNF)

Normal forms for relations is a required topic of a database curriculum. Besides avoiding anomalies, which is already a big issue¹, knowing them certainly helps to understand what is going on in your or someone else’s database design. Even if at some point you decide to abandon a normal form, you should know what you are doing and how to pay a price for that.

Here, I want to discuss normal forms up to Boyce-Codd Normal Form (that is 2NF, 3NF and BCNF). The reason I want to pack together so many normal forms is that they can be understood better collectively: when you have them packed you can easily see the differences. In some aspects I will follow the approach from a recent book by C. J. Date, *Database Design and Relational Theory: Normal Forms and All That Jazz*. (It is certainly worth reading if you are theory oriented.)

### The Basics

Now, I need to recall some useful definitions. A *functional dependency FD*, e.g. A, B, C → D, in a relation **R** means that whenever you fix values of attributes A, B and C then there may be only one value of D in **R**.

Sometimes we have a set of attributes, say **X** such that **X** implies values of all attributes in **R**. (Yes, I write “**X**” bold to denote a set of attributes and “A”, “B” or “X” for a single attribute.) Then, such a set is a *superkey* in **R**. Moreover, if we cannot remove any attribute from this set without losing this property then it is a *proper key* or just a *key*.

In SQL one distinguishes one key in a table by labeling it as PRIMARY KEY. You can also point other keys in a table by labeling some sets of attributes as UNIQUE NOT NULL. And this really helps a database manager to process with data. In relation theory there is no need to treat primary key in a special way so let me just stress that we may have many keys in one relation. If one can extend a set of attributes **X** to a key in **R** then **X** is a *subkey* in **R**. It is not always possible since **X** may be, e.g., too big to be contained in a key which is not a *superkey*.

### Trivial Dependencies

For each attribute A we have A → A. All the more: A, B, C → A. Therefore, I call *trivial* each FD with one attribute repeating on the right and on the left of the arrow. Trivial **FDs necessarily hold in any relation**. There is also another kind of FDs which necessarily holds in any relation. **Let X be a superkey in R**. Then, **for any attribute A in R it holds that X** → A – just by the definition of a superkey. These are important kinds of FDs.

### All Together Now: 2NF, 3NF, BCNF

Now, let me present three short definitions. They are taken from the above mentioned book by C.J. Date.

### Definition 1 (2NF)

A relation **R** is in *2NF* if and only if, for every nontrivial functional dependency **X** → Y in **R** at least one holds: (a) **X** is a superkey, (b) Y is a subkey, (c) **X** is not a subkey.

### Definition 2 (3NF)

A relation **R** is in *3NF* if and only if, for every nontrivial functional dependency **X** → Y in **R** at least one holds: (a) **X** is a superkey, (b) Y is a subkey.

### Definition 3 (BCNF)

A relation **R** is in *BCNF* if and only if, for every nontrivial functional dependency **X** → Y in **R** at least one holds: (a) **X** is a superkey.

The first definition looks pretty complicated and the last one is beautifully simple. This is why I am convinced that it is better to start with **Boyce-Codd Normal Form** (**BCNF**). As you can see each definition forbids a certain type of functional dependency. If we have a relation in BCNF then the only nontrivial FDs are those which have a superkey on its left side. And, as I said, such FDs always hold so we have only necessary FDs. This is an important point: **in BCNF we have only these FDs which holds for each relation!** All the others are forbidden.

### Example: Elections

Let us look at an example to see why we don’t want any other FD. Here we have a simple relation **elections** for election results in various countries.

election_date | country | elected_president |
---|---|---|

November 2, 2004 | USA | George W. Bush |

November 4, 2008 | USA | Barack Obama |

November 6, 2012 | USA | Barack Obama |

March 26, 2000 | Russia | Vladimir Putin |

March 14, 2004 | Russia | Vladimir Putin |

March 2, 2012 | Russia | Vladimir Putin |

May 24, 2015 | Poland | Andrzej Duda |

Of course, if we know

and **election_date**

then we know who is **country**

. This may be written as**elected_president**

**election_date, country → elected_president**

On the other hand let us assume, not so unrealistically, that a person may be a president only in one country. Therefore, we have

**elected_president → country**

These are the only **FD**s in **elections**. It is easy to see that we have two keys: ** {election_date, country}** and

**. Are**

`{election_date, elected_president}`

**elections**in

**BCNF**? Well, our second FD,

**elected_president → country**

, violates the condition for BCNF. The sole attribute **elected_president**

is not a key!### Why Is That Bad?

Now, we see why it is bad to have an FD of the form **X** → Y, where **X** is not a superkey. The reason is **redundancy**. Since

is not a superkey its value may repeat in various rows. It follows that whenever the value of **elected_president**

repeats we have to repeat a **elected_president**

name. Therefore, if we are to update a **country**

name for a row with president Putin we have to do this for all rows with Putin. Otherwise, we violate the second FD and we no longer know which country had president Putin. (More on this in the article.)**country**

### Relaxing BCNF

Now, let us take a step back to 3NF. You may look at the definition of 3NF as giving you one excuse for not following BCNF. Of course, it allows any FD of the form **X** → Y to hold when **X** is a superkey. But even if **X** is not a superkey then **X** → Y is allowed when Y is a subkey. So this is just an additional alibi for an FD to be legal for 3NF.

In our example we see that

is indeed a subkey of the key **country**** {election_date, country}**.

**Elections satisfies 3NF!**

Finally, we can easily understand what’s going on in the definition of 2NF. Again, we are allowed to have **X** → Y when **X** is a superkey (who would forbid this?). Then, if **X** is not a superkey we may still use an alibi from 3NF by checking whether Y is a subkey. And if this also fails we have the last resort by checking whether **X** is not a subkey. If this is the case then we still proved our FD to be legal for 2NF.

We saw that 3NF adds one excuse to BCNF for an FD to be legal and 2NF is nothing more than just adding one more excuse. But, surprisingly, there is a much bigger difference between 3NF and BCNF than between 2NF and 3NF. Namely, you can always decompose a relation in such a way that your new relations are in 3NF and preserve all functional dependencies. On the other hand, you cannot do this for BCNF. The relation **elections** is one of the counterexamples. But that’s a different story.

--------------

¹ See the article “Update Anomalies” by Agnieszka Kozubek.