Candidate key

In the relational model a candidate key of a relation is a set of attributes of that relation such that (1) in all instances of that relation it holds that there are no two distinct tuples with the same values for these attributes and (2) there is not a proper subset for which (1) holds. Since a superkey is defined as a set of attributes for which (1) holds we can also define candidate keys as minimal superkeys, i.e., superkeys of which no proper subset is also a superkey.
The importance of candidate keys is that they tell us how we can identify individual tuples in instances of a relation. As such they are one of the most important types of database constraints that should be specified when designing a database schema. Since an instance of a relation is always a set it holds that every relation will have at least one candidate key. Since in some RDBMSs tables may also represent multisets (which strictly means these DBMSs are not relational) it is an important designrule to specify explicitly at least one candidate key for each relation. For practical reasons RDBMSs usually require that for each relation one of its candidate keys is declared as the primary key, which means that it is considered as the preferred way to identify individual tuples. Foreign keys, for example, are usually required to arrive in such a primary key and not in any other of the candidate keys.
Example
The definition of candidate keys can be illustrated with the following (abstract) example. Consider a relation R with attributes (A, B, C, D) that has only the following two legal instances r1 and r2:


Note that r2 differs from r1 only in the A and D fields of the last tuple.
For the instance r1 the following sets have the uniqueness property, i.e., there are no two tuples in the instance with the same values for the attributes in the set:
 {A,B}, {A,C}, {B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
For the instance r2 the uniqueness property holds for the following sets;
 {B,D}, {C,D}, {B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
Since superkeys of a relation are those sets of attributes that have the uniqueness property for all instances of the relation and because we assume that r1 and r2 are all legal instances of R we can determine the set of superkeys of R by taking the intersection of the two lists:
 {B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
Finally we need to select those sets for which there is no proper subset in the list, which are in this case:
 {B,C}, {A,B,D}, {A,C,D}
These are indeed the candidate keys of relation R.
Note that we have to consider all instances of a relation to determine whether a certain set of attributes is a candidate key. For example, if we had considered only instance r1 then we would have concluded that {A,B} is a candidate key, which is incorrect. However, we can derive from a certain instance that a certain set is not a candidate key because either (1) it does not have the uniqueness property (e.g. {A,D} for r1) or (2) there is a proper subset that has the uniqueness property (e.g. {A,B,C} for r1).
Determining Candidate Keys
The previous example only illustrates the definition of candidate key and not how these are in practice determined. Since most relations have a large number or even infinitely many instances it would be impossible to determine all the sets of attributes with the uniqueness property for each instance. Instead it is easier to consider the sets of realworld entities that are represented by the relation and determine which attributes of the entities uniquely identify them. For example a relation Employee(Name, Address, Dept) probably represents employees and these are likely to be uniquely identified by a combination of Name and Address which is therefore a superkey, and unless the same holds for only Name or only Address, then this combination is also a candidate key.
In order to correctly determine the candidate keys it is important to determine all superkeys, which is especially difficult if the relation represent a set of relationships rather than a set of entities. Therefore it is often useful to attempt to find any "forgotten" superkeys by also determining the functional dependencies. Consider for example the relation Marriage(Man, Wife, Date) for which it will trivially hold that {Man, Wife, Date} is a superkey. If we assume that a certain person can only marry once on a given date then this implies the functional dependencies {Man,Date}>Wife and {Wife,Date}>Man. From this then we can derive more superkeys by applying the following rule:
 if S is a superkey and X>Y a functional dependency
 then (SY)+X is also a superkey
where '' is the set difference and '+' the set union. In this case this leads to the derivation of the superkeys {Man, Date} and {Wife, Date}.