Distributed database
|
Contents |
Introduction
A distributed database is a database that is under the control of a central database management system in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.
Collections of data (eg: in a database) can be distributed across multiple physical locations. A distributed database is distributed into separate partitions / fragments. Each partition / fragment of a distributed database may be replicated (ie: redundant fallovers, RAID like).
Terms
- Horizontal fragments: subsets of tuples (rows) from a relation (table).
- Vertical fragments: subsets of attributes (columns) from a relation (table).
- Mixed fragment: a fragment which is both horizontally and vertically fragmented.
- Homogeneous distributed database: uses one DBMS (eg: Oracle).
- Heterogeneous distributed database: uses multiple DBMS’s (eg: Oracle and MS-SQL and postgresql).
Users access the distributed database through:
- Local applications: applications which do not require data from other sites.
- Global applications: applications which do require data from other sites.
Distributed databases
Care with a distributed database must be taken to ensure that:
- The distribution is transparent – users must be able to interact with the system as if it was one logical system. This applies to the systems performance, and methods of access amongst other things.
- Transactions are transparent – each transaction must maintain database integrity across multiple databases. Transactions must also be divided into subtransactions, each subtransaction effecting one database system.
Advantages of distributed databases:
- Reflects organizational structure – database fragments are located in the departments they relate to.
- Local autonomy – a department can control the data about them (as they are the ones familiar with it.)
- Improved availability – a fault in one database system will only affect one fragment, instead of the entire database.
- Improved performance – data is located near the site of greatest demand, and the database systems themselves are parallellized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won’t affect other modules of the database in a distributed database.)
- Economics – it costs less to create a network of smaller computers with the power of a single large computer.
- Modularity – systems can be modified, added and removed from the distributed database without affecting other modules (systems).
Disadvantages of distributed databases:
- Complexity – extra work must be done by the DBA’s to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database – for example, joins become prohibitively expensive when performed across multiple systems.
- Economics – increased complexity and a more extensive infrastructure means extra labour costs.
- Security – remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (eg: by encrypting the network links between remote sites).
- Difficult to maintain integrity – in a distributed database enforcing integrity over a network may require too much networking resources to be feasible.
- Inexperience – distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice.
See also
Reference
- Federal Standard 1037C
- Author: Elmasri and Navathe Title: Fundamentals of database systems (3rd edition) Publisher: Addison-Wesley Longman ISBN: 0-201-54263-3