Trie
|
In computer science, a trie is an ordered tree data structure that is used to store an associative array where the keys are strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of any one node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that happen to correspond to keys of interest.
The term trie comes from "retrieval". Due to its etymology some sources say it should be pronounced as "tree", while others encourage the use of "try" in order to distinguish it from the more general tree.
In the shown example, keys are listed in the nodes and values below them. Each complete English word has an integer value associated with it. A trie can be seen as a deterministic finite automaton, although the symbol on each edge is often implicit in the order of the branches.
Advantages and disadvantages
There are three main advantages of tries over binary search trees (BSTs):
- Looking up keys is faster. Looking up a key of length m takes only O(m) time; in the worst case in a BST, O(m²) time is required, because initial characters are examined repeatedly during multiple comparisons. Also, the simple operations tries use during lookup, such as array indexing using a character, are fast on real machines.
- Tries require less space. Because the keys are not stored explicitly, only an amortized constant amount of space is needed to store each key.
- Tries make longest-prefix matching, where we wish to find the key sharing the longest possible prefix with a given key, efficient. They also allow one to associate a value with an entire group of keys that have a common prefix.
Although it seems restrictive to say a trie's key type must be a string, many common data types can be seen as strings; for example, an integer can be seen as a string of bits. Integers with common bit prefixes occur as map keys in many applications such as routing tables and address translation tables.
Tries are most useful when the keys are of varying lengths and we expect some key lookups to fail, because the key is not present. If we have fixed-length keys, and expect all lookups to succeed, then we can improve key lookup by combining every node with a single child (such as "i" and "in" above) with its child, producing a Patricia trie. This saves space in maps where long paths down the trie do not have branches fanning out, for example in maps where many keys have a long common prefix or where many portions of keys are composed of characters all unique.
See also
External links
- NIST's Dictionary of Algorithms and Data Structures: Trie (http://www.nist.gov/dads/HTML/trie.html)
- Tries (http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie/) by Lloyd Allison
- An Implementation of Double-Array Trie (http://linux.thai.net/~thep/datrie/datrie.html)zh:Trie