*A candidate key is an important concept in database normalization. Read on to find out what a candidate key is and how to check if a set of attributes is a candidate key.*

The **candidate key, **also simply called a **key,** is an important part of database design. It is the theoretical foundation for technical concepts like primary and alternate (unique) keys. Every database designer should be aware of how to identify candidate keys and how to choose the right one for their table.

The concept of the candidate key is taught in all university database courses as part of database normalization theory. The common problems you’ll face when learning about candidate keys are verifying if a given set of attributes is a candidate key and finding all candidate keys for a relation.

Understanding candidate keys is important for understanding the normal forms in database tables. This knowledge will help you remember the rules for the most common normal forms.

In this article, we’ll explain the concept of candidate keys in simple terms. Plus, we’ll show you how to verify if a set of attributes is a candidate key.

**Basic database normalization terminology**

*Before you read about candidate keys, make sure you are familiar with the basic normalization terminology. Let’s briefly review the most important terms.*

A **relation** is the theoretical name for a database table. A relation (table) has a name and consists of named attributes (columns).

A **functional dependency** in a relation (*A -> B*) tells you that whenever two rows have the same values for all attributes in set A, they will also have the same values for all attributes in set B.

The **closure** of a set of attributes is the set of those attributes that can be functionally determined from this set. You can review the algorithm to compute the closure of attributes here.

## Superkeys

Informally, a candidate key is a set of attributes that uniquely identify a row.

By definition, a candidate key is a minimal superkey. So, what does this mean? A **superkey** is an attribute or a set of attributes such that its closure is all attributes in the relation.

Let’s see some examples. Here, we have the CourseEditions table. It stores information about course editions.

Each year, a given course can be taught by a different teacher, with a different price and different limit on spots. We thus have the following functional dependencies:

*id -> course, year, teacher, price, spots*– the ID determines all other attributes*course, year -> id, teacher, price, spots*– the course and year determine the ID, teacher, price, and spots.

CourseEditions

id | course | year | teacher | price | spots |
---|---|---|---|---|---|

1 | Databases | 2019 | Chris Cape | 100 | 45 |

2 | Mathematics | 2019 | Daniel Parr | 80 | 34 |

3 | Databases | 2020 | Jennifer Clock | 110 | 30 |

What are the superkeys in this table? First, all of the attributes form a superkey, so the set *{id, course, year, teacher, price, spots}* is a superkey. Remember that the set of all attributes is a superkey in all tables.

Are there any smaller superkeys in this table? Yes, there are. The set *{id}* is a superkey. We have the functional dependency *id -> course, year, teacher, price, spots*, and of course, we have the trivial dependency *id -> id*. Once we have the *id,* we can determine all of the other attributes from the functional dependencies.

The set *{course, year}* is also a superkey. We have the functional dependency *course, year -> id, teacher, price, spots*, and we have the trivial functional dependencies *course -> course* and *year -> year*. Once we have *course* and *year*, we can determine all of the other attributes from the functional dependencies.

The set *{id, course, year, teacher}* is also a superkey. We have *id*, *course*, and *year*. So, we can determine all of the other attributes in the table with these three attributes.

On the other hand, the set *{teacher}* is not a superkey. If we know the *teacher, *we can’t determine any other attribute other than the teacher. The set *{teacher, price}* is also not a superkey. Once we have *teacher* and *price*, we can’t determine any more attributes.

## Minimal Superkeys

Not all superkeys are candidate keys. To be a candidate key, a superkey must be **minimal,** which means that if you take any attributes out of it, it won't be a superkey anymore. Let’s look at some examples.

The set *{id} *is a superkey, and it’s minimal. You can’t take attributes out of it, because you’ll then have an empty set, and an empty set is not a superkey. Thus, the set *{id}* is a candidate key.

The set *{course, year} *is also a superkey and a candidate key. If you take any of the attributes out of it, the remaining set is no longer a superkey. You need both *course* and *year* to determine the other attributes in the set.

However, the set *{id, course, year, teacher}* is a superkey but not a candidate key. For example, if you remove the attribute *teacher,* the remaining set is still a superkey. In fact, in this case, you can remove any attribute from *{id, course, year, teacher}*, and the remaining set will still be a superkey.

Note that a minimal superkey doesn’t mean the superkey with the smallest number of elements. Both *{id}* and *{course, year}* are candidate keys even though they have a different number of elements.

## Algorithm: Verifying That a Set of Attributes Is a Candidate Key

This is the common database design problem: how do you verify if a set of attributes is a candidate key?

Here’s the algorithm to verify it:

- Step 1: Check if the given set is a superkey. Compute the closure of attributes in the set. If the closure is the set of all attributes, the set is a superkey.
- Step 2: Check if the superkey is minimal. Remove each attribute, one at a time. If the remaining set is a superkey, the superkey is not minimal, and the set is not a candidate key. If you can’t remove any of the attributes and keep the superkey property, the set is a candidate key.

For example, let’s check if the set *{course, year}* is indeed a candidate key.

- Step 1: Let’s compute the closure of
*{course, year}.*Using the closure algorithm, we conclude that the closure is indeed*{id, course, year, teacher, price, spots}.*Thus, the set*{course, year}*is indeed a superkey. - Step 2. Let’s try to remove
*course*from the set. We’re left with the set*{year}.*There is no functional dependency with just*year*as the left-hand-side. Thus, the closure of this set is*{year}*. Similarly, when we remove the attribute*year,*the closure of the remaining set is*{course}.*Neither*{year}*nor*{course}*are superkeys, so the set*{course, year}*is a minimal superkey and thus, a candidate key.

If you liked this article, check out other normalization articles on our blog.

If you’re a student taking database classes, make sure to create a free Academic Account in Vertabelo, our online ER diagram drawing tool. It allows you to draw logical and physical ER diagrams directly in your browser.

Vertabelo supports PostgreSQL, SQL Server, Oracle, MySQL, Google BigQuery, Amazon Redshift, and other relational databases. Try it out, and see how easy it is to get started!