A distributed database is a collection of database tables distributed over two or more servers.

There are three ways to distribute data:

  1. Fragmentation. Different locations store different parts of the database. Horizontal fragmentation stores different records on different servers. Vertical fragmentation stores different fields on different servers. This is good for when different sites need different data but must still access data elsewhere.
  2. Downloading. A location will take a snapshot of the database parts it needs and work off of that. This is good for times when the same data must be local at different sites but does not change often.
  3. Replication. A location will keep a replicated copy of the database parts it needs and work off of that. Replication ensures that all replicates automatically and immediately update all other replicates of the same part. This is good for times when the same data must be local at different sites and stay current.
Distributed databases exist for two main reasons. The first is that performance can be increased if the people who need certain data have the data on a server on-site. The second is that a distributed database is a way to collaborate data developed by different groups. In either case, a distributed database should be transparent, i.e. the functionality of the database should be enhanced by the distribution, not hampered.

Just as C. J. Date developed the rules of database normalization, he also developed rules for distributed databases. And just as no one follows all the rules of normalization, no one follows all the rules for distributed databases.

  1. Local autonomy. Each server in the distributed database should be independent and have control of its data.
  2. No reliance on a central site. A distributed database should not rely on a single site for its operation.
  3. Continuous operation. The entire distributed database should not have to shutdown for maintenance.
  4. Location transparency and location independence. An application or user on the distributed database should not concern itself with the location of data it needs.
  5. Fragmentation independence. An application or user on the distributed database should not concern itself if a table becomes fragmented over several servers.
  6. Replication independence. An application or user on the distributed database should not concern itself with the replication of data.
  7. Distributed query processing. Each server in the distributed database should be aware of how data is distributed for querying purposes.
  8. Distributed transaction management. A distributed database should be able to handle transactions involving multiple servers.
  9. Hardware independence. It should not matter what hardware is used for servers on the distributed database.
  10. Operating system independence. It should not matter what operating system is used for servers on the distributed database.
  11. Network independence. It should not matter what network protocols are used on the distributed database.
  12. DBMS independence. It should not matter what DataBase Management Systems are running on each server in the distributed database.


GeorgeHernandez.comSome rights reserved