Relational model

The relational model for management of a database is a data model based on predicate logic and set theory.
Contents 
The model
The fundamental assumption of the relational model is that all data are represented as mathematical relations, i.e., a subset of the Cartesian product of n sets. In the mathematical model, reasoning about such data is done in twovalued predicate logic (that is, without NULLs), meaning there are two possible evaluations for each proposition: either true or false. Data are operated upon by means of a relational calculus and algebra.
The relational data model permits the designer to create a consistent logical model of information, to be refined through database normalization. The access plans and other implementation and operation details are handled by the DBMS engine, and should not be reflected in the logical model. This contrasts with common practice for SQL DBMSs in which performance tuning often requires changes to the logical model.
The basic relational building block is the domain, or data type. A tuple is an ordered multiset of attributes, which are ordered pairs of domain and value. A relvar (relation variable) is a set of ordered pairs of domain and name, which serves as the header for a relation. A relation is a set of tuples. Although these relational concepts are mathematically defined, they correspond loosely to traditional database concepts. A table is an accepted visual representation of a relation; a tuple is similar to the concept of row.
The basic principle of the relational model is the Information Principle: all information is represented by data values in relations. Thus, the relvars are not related to each other at design time: rather, designers use the same domain in several relvars, and if one attribute is dependent on another, this dependency is enforced through referential integrity.
Competition
Other models are the hierarchical model and network model. Some systems using these older architectures are still in use today in data centers with high data volume needs or where existing systems are so complex it would be cost prohibitive to migrate to systems employing the relational model; also of note are newer objectoriented databases, even though many of them are DBMSconstruction kits, rather than proper DBMSs.
The relational model was the first formal database model. After it was defined, informal models were made to describe hierarchical databases (the hierarchical model) and network databases (the network model). Hierarchical and network databases existed before relational databases, but were only described as models after the relational model was defined, in order to establish a basis for comparison.
History
The relational model was invented by Dr. Ted Codd as a general model of data, and subsequently maintained and developed by Chris Date and Hugh Darwen among others. In The Third Manifesto (1995) they show how the relational model can be extended with objectoriented features without compromising its fundamental principles.
Misimplementation
SQL, initially pushed as the standard language for relational databases, was actually always in violation of it. SQL DBMS's are thus not actually RDBMS's, and the current ISO SQL standard doesn't mention the relational model or use relational terms or concepts.
Implementation
There have been several attempts to produce a true implementation of the relational database model originally developed by Codd, Date, Darwen and others, but none have been popular successes so far. Rel (http://dbappbuilder.sourceforge.net/Rel.html) is one of the more recent attempts to do this.
Controversies
Codd himself proposed a threevalued logic version of the relational model, and a fourvalued logic version has also been proposed, in order to deal with missing information. But these have never been implemented, presumably because of attending complexity. SQL NULLs were intended to be part of a threevalued logic system, but fell short of that due to logical errors in the standard and in its implementations.
Design
Database normalization is usually performed when designing a relational database, to improve the logical consistency of the database design and the transactional performance.
There are two commonly used systems of diagramming to aid in the visual representation of the relational model: the entityrelationship diagram (ERD), and the related IDEF diagram used in the IDEF1X method created by the U.S. Air Force based on ERDs.
Example database
An idealized, very simple example of a description of some relvars and their attributes:
Customer(Customer ID, Tax ID, Name, Address, City, State, Zip, Phone)
Order(Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)
Order Line(Order No, Order Line No, Product Code, Qty)
Invoice(Invoice No, Customer ID, Order No, Date, Status)
Invoice Line(Invoice No, Line No, Product Code, Qty Shipped)
Product(Product Code, Product Description)
In this design we have six relvars: Customer, Product, Order, Order Line, Invoice, and Invoice Line. The bold, underlined attributes are candidate keys. The nonbold, underlined attributes are foreign keys.
Usually one candidate key is arbitrarily chosen to be called the primary key and used in preference over the other candidate keys, which are then called alternate keys.
A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into something else, namely a bag, by violating the basic definition of a set. A key can be composite, that is, can be composed of several attributes. Below is a tabular depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed to a relvar.
Set Theory Formulation
Basic notions in the relational model are relation names and attribute names. We will represent these as strings such as "Person" and "name" and we will usually use the variables r, s, t, ... and a, b, c to range over them. Another basic notion is the set of atomic values that contains values such as numbers and strings.
Our first definition concerns the notion of tuple, which formalizes the notion of row or record in a table:
 Def. A tuple is a partial function from attribute names to atomic values.
 Def. A header is a finite set of attribute names.
 Def. The projection of a tuple t on a finite set of attributes A is t[A] = { (a, v) : (a, v) ∈ t, a ∈ A }.
The next definition defines relation which formalizes the contents of a table as it is defined in the relational model.
 Def. A relation is a tuple (H, B) with H, the header, and B, the body, a set of tuples that all have the domain H.
Such a relation closely corresponds to what is usually called the extension of a predicate in firstorder logic except that here we identify the places in the predicate with attribute names. Usually in the relational model a database schema is said to consist of a set of relation names, the headers that are associated with these names and the constraints that should hold for every instance of the database schema.
 Def. A relation universe U over a header H is a nonempty set of relations with header H.
 Def. A relation schema (H, C) consists of a header H and a predicate C(R) that is defined for all relations R with header H.
 Def. A relation satisfies the relation schema (H, C) if it has header H and satisfies C.
Key constraints and functional dependencies
One of the simplest and most important types of relation constraints is the key constraint. It tells us that in every instance of a certain relational schema the tuples can be identified by their values for certain attributes.
 Def. A superkey is written as a finite set of attribute names.
 Def. A superkey K holds in a relation (H, B) if K ⊆ H and there are no two distinct tuples t_{1} and t_{2} in B such that t_{1}[K] = t_{2}[K].
 Def. A superkey holds in a relation universe U over a header H if it holds in all relations in U.
 Def.  A superkey K holds as a candidate key for a relation universe U over H if it holds as a superkey for U and there is no proper subset of K that also holds as a superkey for U.
 Def. A functional dependency (or FD for short) is written as X>Y with X and Y finite sets of attribute names.
 Def. A functional dependency X>Y holds in a relation (H, B) if X and Y are subsets of H and for all tuples t_{1} and t_{2} in B it holds that if t_{1}[X] = t_{2}[X] then t_{1}[Y] = t_{2}[Y]
 Def. A functional dependency X>Y holds in a relation universe U over a header H if it holds in all relations in U.
 Def. A functional dependency is trivial under a header H if it holds in all relation universes over H.
 Theorem A FD X>Y is trivial under a header H iff Y ⊆ X ⊆ H.
 Theorem A superkey K holds in a relation universe U over H iff K ⊆ H and K>H holds in U.
 Def. (Armstrong's rules) Let S be a set of FDs then the closure of S under a header H, written as S^{+}, is the smallest superset of S such that:
 (reflexivity) if Y ⊆ X ⊆ H then X>Y in S^{+}
 (transitivity) if X>Y in S^{+} and Y>Z in S^{+} then X>Z in S^{+}
 (augmentation) if X>Y in S^{+} and Z ⊆ H then X∪Z > Y∪Z in S^{+}
 Theorem Armstrong's rules are sound and complete, i.e., given a header H and a set S of FDs that only contain subsets of H then the FD X>Y is in S^{+} iff it holds in all relation universes over H in which all FDs in S hold.
 Def. If X is a finite set of attributes and S a finite set of FDs then the completion of X under S, written as X^{+}, is the smallest superset of X such that:
 if Y>Z in S and Y ⊆ X^{+} then Z ⊆ X^{+}
The completion of an attribute set can be used to compute if a certain dependency is in the closure of a set of FDs.
 Theorem Given a header H and a set S of FDs that only contain subsets of H it holds that X>Y is in S^{+} iff Y ⊆ X^{+}.
 Algorithm (deriving candidate keys from FDs)
INPUT: a set S of FDs that contain only subsets of a header H OUTPUT: the set C of superkeys that hold as candidate keys in all relation universes over H in which all FDs in S hold begin C := ∅; // found candidate keys Q := { H }; // superkeys that contain candidate keys while Q <> ∅ do let K be some element from Q; Q := Q  { K }; minimal := true; for each X>Y in S do K' := (K  Y) ∪ X; // derive new superkey if K' ⊂ K then minimal := false; Q := Q ∪ { K' }; fi od if minimal and there is not a subset of K in C then remove all supersets of K from C; C := C ∪ { K }; fi od end
 Def. Given a header H and a set of FDs S that only contain subsets of H an irreducible cover of S is a set T of FDs such that
 S^{+} = T^{+}
 there is no proper subset U of T such that S^{+} = U^{+},
 if X>Y in T then Y is a singleton set and
 if X>Y in T and Z a proper subset of X then Z>Y is not in S^{+}.
See also
References
 Codd, E. F. (1970). "A relational model of data for large shared data banks". Communications of the ACM, , Vol. 13, No. 6, pp. 377387. Retrieved from http://www.acm.org/classics/nov95/toc.html Sept. 4, 2004.
 Date, Christopher J. (2003); Introduction to Database Systems. 8th ed.
External links
 A Relational Model of Data for Large Shared Data Banks (http://www.acm.org/classics/nov95/toc.html)
 DMoz category (http://dmoz.org./Computers/Software/Databases/Relational/)
 Relational Model (http://c2.com/cgi/wiki?RelationalModel)ia:Base de datos relational
lt:Reliacinis modelis pl:Model relacyjny pt:Modelo relacional