Aho-Corasick algorithm
|
The Aho-Corasick algorithm is a string searching algorithm discovered by Alfred V. Aho and Margaret J. Corasick. It is a kind of dictionary-matching algorithm that locates elements of a finite set of patterns (the "dictionary") within an input text. It matches all patterns "at once", so the complexity of the algorithm is linear in the size of the patterns plus the size of the search string.
Informally, the algorithm constructs a finite automaton first and then applies that automaton to the input text. When the pattern dictionary is known in advance (e.g. a computer virus database), the construction of the automaton can be performed once off-line and the compiled automaton stored for later use.
The Aho-Corasick algorithm forms the basis of the Unix command fgrep.
Sources
- Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Communications of the ACM 18(6):333–340, June 1975. Further details and full text at the ACM Digital Library site (http://doi.acm.org/10.1145/360825.360855)de:Aho-Corasick-Algorithmus