Back to articles list
- 3 minutes read

On Keys

Today we continue our series of posts on data normalization. In the previous post on data normalization I explained what functional dependency is. Today we will talk about candidate keys in a table.

A candidate key is a set of columns such that all other columns in the table are dependent on it, and the set is minimal, that is if you remove a column, then the resulting set is not a candidate key.

Example: Table Person




Remember the table person seen in the previous post and in our videos. We’ve already discovered that all columns depend on the column SSN. The set {SSN} has only one element. The set can’t be minimized – no columns depend on the empty set.

Thus SSN is the candidate key in the table person.

Example: Table Students




Another example we have already seen. The nontrivial functional dependencies in the table are:

student → semester
student, lecture → teaching_assistant

All columns depend on the set {student, lecture}. The semester column depends on the column student, so it depends also on the set {student, lecture}. The teaching_assistant column depends on the set {student, lecture} because of the second dependency.

The set {student, lecture} is a candidate key: it is a minimal set of columns on which all columns depend. You can’t remove the column student from the candidate key: the semester column does not depend on the lecture column. Likewise, you can’t remove the lecture column from the candidate key: the teaching_assistant column does not depend on the student column only.

Thus the pair {student, lecture} is the candidate key in the table students.

Primary Keys

The might be more than one candidate key in a table. If that’s the case you choose one of the candidate keys as primary. The database management systems use primary keys to store and manage data more effectively.

Example: Person With Passport




Let’s take a look at the table person2. We’ve added the passport column to the table person. The passport number determines personal data of a person in the same way as the SSN column. The table person2 has two candidate keys: SSN and passport. You can choose either column as primary key.

The set {SSN, passport} is NOT a candidate key. You can remove either column and you still have all other columns functionally dependent on a candidate key.

Example: Student With Identity Card

We’ve modified the table students by adding student identity card number. Now the semester column depends both on the student column and on the student_id_number column.

student → semester
student_id_number → semester



The table students2 has two candidate keys: {student, lecture} and {student_id_number, lecture}. Any of them might be chosen as primary key.

Superkeys

A set which is a superset of a candidate key (contains all columns in a key and more) is called a superkey. We’ll need to know what a superkey is when we talk about normalization.

Keys in Databases

In real-world databases you have to explicitly mark a set of columns as a key, either primary or alternate. The database uses the knowledge of primary keys to efficiently manage data. The set of columns might be a key (in the theoretical sense explained in this article) even if you don’t mark it as such. But then your database has no way of knowing that it is a key.

go to top

Our website uses cookies. By using this website, you agree to their use in accordance with the browser settings. You can modify your browser settings on your own. For more information see our Privacy Policy.