B-tree
|
B-trees are tree data structures that are most commonly found in databases and filesystems. B-trees keep data sorted and allow amortized logarithmic time insertions and deletions. B-trees generally grow from the bottom up as elements are inserted, whereas most binary trees grow down.
B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes. This usually occurs when most nodes are in secondary storage such as hard drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases, balancing occurs less often, and efficiency increases. Usually this value is set such that each node takes up a full disk block or an analogous size in secondary storage.
The idea behind B-trees is that inner nodes can have a variable number of child nodes within some pre-defined range. Hence, B-trees do not need re-balancing as frequently as other self-balancing binary search trees. The lower and upper bounds on the number of child nodes are fixed for a particular implementation. For example, in a 2-3 B-tree (often simply 2-3 tree), each internal node may have only 2 or 3 child nodes. A node is considered to be in an illegal state if it has an invalid number of child nodes.
The B-tree's creator, Rudolf Bayer, has not explained what the B stands for. The most common belief is that B stands for balanced, as all the leaf nodes are at the same level in the tree. B may also stand for Bayer, or for Boeing, because he was working for Boeing Scientific Research Labs.
Contents |
Node structures
Nodes in a B-Tree are usually represented as an ordered set of elements and child pointers. Every node but the root contains a minimum of L elements, a maximum of U elements, and a maximum of U+1 child pointers, for some arbitrary L and U. For all internal nodes, the number of child pointers is always one more than the number of elements. Since all leaf nodes are at the same height, nodes do not generally contain a way of determining whether they are leaf or internal.
Each inner node's elements act as separation values which divide its subtrees. For example, if an inner node has three child nodes (or subtrees) then it must have two separation values or elements a1 and a2. All values in the leftmost subtree will be less than a1 , all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2.
Algorithms
Search
Search is performed in the typical manner, analogous to that in a binary search tree. Starting at the root, the tree is traversed top to bottom, choosing the child pointer whose separation values are on either side of the value that is being searched. Binary search is typically used within nodes to determine this location.
Insertion
For a node to be in an illegal state, it must contain a number of elements which is outside of the acceptable range.
- First, search for the position into which the node should be inserted. Then, insert the value into that node.
- If no node is in an illegal state then the process is finished.
- If some node has too many elements, it is split into two nodes, each with the minimum amount of elements. This process continues recursively up the tree until the root is reached. If the root is split, a new root is created. Typically the minimum and maximum number of elements must be selected such that the minimum is no more than one half the maximum in order for this to work.
Deletion
- First, search for the value which will be deleted. Then, remove the value from the node which contains it.
- If no node is in an illegal state then the process is finished.
- If some node is in an illegal state then there are two possible cases:
- Its sibling node, a child of the same parent node, can transfer one or more of its child nodes to the current node and return it to a legal state. If so, after updating the separation values of the parent and the two siblings the process is finished.
- Its sibling does not have an extra child because it is on the lower bound. In that case both siblings are merged into a single node and we recurse onto the parent node, since it has had a child node removed. This continues until the current node is in a legal state or the root node is reached, upon which the root's children are merged and the merged node becomes the new root.
Notes
Suppose L is the least number of children a node is allowed to have, while U is the most number. Then each node will always have between L and U children, inclusively, with one exception: the root node may have anywhere from 2 to U children inclusively. In other words, the root is exempt from the lower bound restriction, having a lower bound, 2, of its own. This allows the tree to hold small numbers of elements. The root having one child makes no sense, since the subtree attached to that child could simply be attached to the root. Giving the root no children is also unnecessary, since a tree with no elements is typically represented as having no root node.
Robert Tarjan proved that the amortized number of splits/merges is 2.
See also
References
Original papers:
- Rudolf Bayer, Binary B-Trees for Virtual Memory, ACM-SIGFIDET Workshop 1971, San Diego, California, Session 5B, p. 219-235.
- Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972.
Summary:
- Donald E. Knuth, "The Art of Computer Programming", second edition, volume 3, section 6.2.4, 1997.
External links
- Animation of a 2-3-4 Tree (the simplest form of B-Tree) (http://www.cse.ohio-state.edu/~bondhugu/acads/234-tree/index.shtml)
- http://www.bluerwhite.org/btree
- B-Tree animation (Java Applet) (http://slady.net/java/bt/)
- NIST's Dictionary of Algorithms and Data Structures: B-tree (http://www.nist.gov/dads/HTML/btree.html)
- B-tree algorithms (http://www.semaphorecorp.com/btp/algo.html)
- B-tree variations (http://www.semaphorecorp.com/btp/var.html)
- Source code for a balanced tree (B-tree) (Windows required for test timings) (http://textelectric.net/btree.html)de:B-Baum