Back to articles list

September 29, 2015 - 6 minutes read

Design fundamentals

A Unified View on Database Normal Forms: From the Boyce-Codd Normal Form to the Second Normal Form (2NF, 3NF, BCNF)

Konrad Zdanowski

Assistant Professor at Cardinal Stefan Wyszyński University

Tags:

Normal forms for relations is a required topic of a database curriculum. Besides avoiding anomalies, which is already a big issue¹, knowing them certainly helps to understand what is going on in your or someone else’s database design. Even if at some point you decide to abandon a normal form, you should know what you are doing and how to pay a price for that.

Here, I want to discuss normal forms up to Boyce-Codd Normal Form (that is 2NF, 3NF and BCNF). The reason I want to pack together so many normal forms is that they can be understood better collectively: when you have them packed you can easily see the differences. In some aspects I will follow the approach from a recent book by C. J. Date, Database Design and Relational Theory: Normal Forms and All That Jazz. (It is certainly worth reading if you are theory oriented.)

The Basics

Now, I need to recall some useful definitions. A functional dependency FD, e.g. A, B, C → D, in a relation R means that whenever you fix values of attributes A, B and C then there may be only one value of D in R.

Sometimes we have a set of attributes, say X such that X implies values of all attributes in R. (Yes, I write “X” bold to denote a set of attributes and “A”, “B” or “X” for a single attribute.) Then, such a set is a superkey in R. Moreover, if we cannot remove any attribute from this set without losing this property then it is a proper key or just a key.

In SQL one distinguishes one key in a table by labeling it as PRIMARY KEY. You can also point other keys in a table by labeling some sets of attributes as UNIQUE NOT NULL. And this really helps a database manager to process with data. In relation theory there is no need to treat primary key in a special way so let me just stress that we may have many keys in one relation. If one can extend a set of attributes X to a key in R then X is a subkey in R. It is not always possible since X may be, e.g., too big to be contained in a key which is not a superkey.

Trivial Dependencies

For each attribute A we have A → A. All the more: A, B, C → A. Therefore, I call trivial each FD with one attribute repeating on the right and on the left of the arrow. Trivial FDs necessarily hold in any relation. There is also another kind of FDs which necessarily holds in any relation. Let X be a superkey in R. Then, for any attribute A in R it holds that X → A – just by the definition of a superkey. These are important kinds of FDs.

All Together Now: 2NF, 3NF, BCNF

Now, let me present three short definitions. They are taken from the above mentioned book by C.J. Date.

Definition 1 (2NF)

A relation R is in 2NF if and only if, for every nontrivial functional dependency X → Y in R at least one holds: (a) X is a superkey, (b) Y is a subkey, (c) X is not a subkey.

Definition 2 (3NF)

A relation R is in 3NF if and only if, for every nontrivial functional dependency X → Y in R at least one holds: (a) X is a superkey, (b) Y is a subkey.

Definition 3 (BCNF)

A relation R is in BCNF if and only if, for every nontrivial functional dependency X → Y in R at least one holds: (a) X is a superkey.

The first definition looks pretty complicated and the last one is beautifully simple. This is why I am convinced that it is better to start with Boyce-Codd Normal Form (BCNF). As you can see each definition forbids a certain type of functional dependency. If we have a relation in BCNF then the only nontrivial FDs are those which have a superkey on its left side. And, as I said, such FDs always hold so we have only necessary FDs. This is an important point: in BCNF we have only these FDs which holds for each relation! All the others are forbidden.

Example: Elections

Let us look at an example to see why we don’t want any other FD. Here we have a simple relation elections for election results in various countries.

election_date	country	elected_president
November 2, 2004	USA	George W. Bush
November 4, 2008	USA	Barack Obama
November 6, 2012	USA	Barack Obama
March 26, 2000	Russia	Vladimir Putin
March 14, 2004	Russia	Vladimir Putin
March 2, 2012	Russia	Vladimir Putin
May 24, 2015	Poland	Andrzej Duda

Of course, if we know election_date and country then we know who is elected_president. This may be written as

election_date, country → elected_president

On the other hand let us assume, not so unrealistically, that a person may be a president only in one country. Therefore, we have

elected_president → country

These are the only FDs in elections. It is easy to see that we have two keys: {election_date, country} and {election_date, elected_president}. Are elections in BCNF? Well, our second FD, elected_president → country, violates the condition for BCNF. The sole attribute elected_president is not a key!

Why Is That Bad?

Now, we see why it is bad to have an FD of the form X → Y, where X is not a superkey. The reason is redundancy. Since elected_president is not a superkey its value may repeat in various rows. It follows that whenever the value of elected_president repeats we have to repeat a country name. Therefore, if we are to update a country name for a row with president Putin we have to do this for all rows with Putin. Otherwise, we violate the second FD and we no longer know which country had president Putin. (More on this in the article.)

Relaxing BCNF

Now, let us take a step back to 3NF. You may look at the definition of 3NF as giving you one excuse for not following BCNF. Of course, it allows any FD of the form X → Y to hold when X is a superkey. But even if X is not a superkey then X → Y is allowed when Y is a subkey. So this is just an additional alibi for an FD to be legal for 3NF.

In our example we see that country is indeed a subkey of the key {election_date, country}. Elections satisfies 3NF!

Finally, we can easily understand what’s going on in the definition of 2NF. Again, we are allowed to have X → Y when X is a superkey (who would forbid this?). Then, if X is not a superkey we may still use an alibi from 3NF by checking whether Y is a subkey. And if this also fails we have the last resort by checking whether X is not a subkey. If this is the case then we still proved our FD to be legal for 2NF.

We saw that 3NF adds one excuse to BCNF for an FD to be legal and 2NF is nothing more than just adding one more excuse. But, surprisingly, there is a much bigger difference between 3NF and BCNF than between 2NF and 3NF. Namely, you can always decompose a relation in such a way that your new relations are in 3NF and preserve all functional dependencies. On the other hand, you cannot do this for BCNF. The relation elections is one of the counterexamples. But that’s a different story.

--------------
¹ See the article “Update Anomalies” by Agnieszka Kozubek.