LZW
|
LZW (Lempel-Ziv-Welch) is a lossless data compression algorithm. The algorithm is derived from the LZ77 algorithm presented by Abraham Lempel and Jacob Ziv in a paper entitled "A Universal Algorithm for Sequential Data Compression" in the IEEE Transactions on Information Theory, May 1977. In 1978 they developed an improved version, now known as the LZ78 algorithm, which was later improved by Terry Welch in 1984, resulting in the LZW algorithm. (See also LZMA).
Contents |
Description of the algorithm
The key insight of the method is that it is possible to automatically build a dictionary of previously seen strings in the text being compressed. The dictionary does not have to be transmitted with the compressed text, since the decompressor can build it the same way the compressor does, and if coded correctly, will have exactly the same strings that the compressor dictionary had at the same point in the text.
The dictionary starts off with 256 entries, one for each possible character (single byte string). Every time a string not already in the dictionary is seen, a longer string consisting of that string appended with the single character following it in the text, is stored in the dictionary.
The output consists of integer indices into the dictionary. These initially are 9 bits each, and as the dictionary grows, can increase to up to 16 bits. A special symbol is reserved for "flush the dictionary" which takes the dictionary back to the original 256 entries, and 9 bit indices. This is useful if compressing a text which has variable characteristics, since a dictionary of early material is not of much use later in the text.
This use of variably increasing index sizes is one of Welch's contributions. Another was to specify an efficient data structure to store the dictionary.
Simple example of dictionary-based compression algorithm
In general, dictionary-based compression replaces phrases with tokens. If the number of bits in the token is less than the number of bits in the phrase, compression will occur. UNCOMPRESSED TEXT:
"I am stupid and because I am stupid, I can't even tell you that I am stupid."
COMPRESSED TEXT:
"$1 and because $1, I can't even tell you that $1. $1=[I am stupid]"
This is far from efficient, but it illustrates the compression through phrase replacement.
Uses
The method became moderately widely used in the program "compress" which became a more or less standard utility in Unix systems circa 1986. (It has since disappeared from many for both legal and technical reasons.) Several other popular compression utilities also used the method, or closely related ones.
It became very widely used after it became part of the GIF image format in 1987. It may also (optionally) be used in TIFF-files.
LZW compression provided a better compression ratio, in most applications, than any well known method available up to that time. It became the first widely used general purpose data compression method on computers. On large English texts, it typically compressed to about half original size. Other kinds of data were also quite usefully compressed in many cases.
Patent issues
Various patents have been issued in the USA and other countries for LZW and similar algorithms. LZ78 was covered by Template:US patent by Lempel, Ziv, Cohn, and Eastman, assigned to Sperry Corporation, later Unisys Corporation, filed on August 10, 1981, and presumably now expired. Two US patents were issued for the LZW algorithm: Template:US patent by Victor S. Miller and Mark N. Wegman and assigned to IBM, originally filed on June 1, 1983, and Template:US patent by Welch, assigned to Sperry Corporation, later Unisys Corporation, filed on June 20, 1983.
US Patent 4,558,302 is the one that has caused the most controversy. Unisys at one time granted royalty-free patent licenses to developers of free software and freely available proprietary software; the company terminated those licenses in August 1999. Many legal experts have concluded that the patent does not cover devices that can only uncompress LZW data and cannot compress it; for this reason, the popular Gzip program can read .Z files but cannot write them.
It was reported in Debian Weekly News (http://www.debian.org/News/weekly/2002/45/) based on a comp.compression thread (http://groups.google.com/groups?&threadm=a5aa8dd0.0208271613.3cd18da6%40posting.google.com), that the Unisys patent in the USA expired on December 20, 2002 - 17 years and 10 days after it was granted. Most other sources claim the patent expired in June 2003, 20 years after it was filed, because 35 USC §154(c)(1) (http://www.law.cornell.edu/uscode/35/154.html) specifies that patents subsisting as of six months after the enactment of the Uruguay Round Agreements Act last for the greater of 17 years after grant and 20 years after filing.
According to a statement on Unisys's web site, counterpart patents on LZW in the United Kingdom, France, Germany, Italy, and Japan have expired in June 2004, and the Canadian patent expired on July 7, 2004. The Canadian patent was the last known Unisys patent on the LZW algorithm still in effect.
IBM's US patent will expire on August 11 2006.
Lempel-Ziv-Welch vs. Ziv-Lempel-Welch
Although the LZW acronym obviously refers to the inventors as Lempel, Ziv and Welch, some people claim the intellectual property rights go to Ziv first, so the method must be called the Ziv-Lempel-Welch algorithm, and not the Lempel-Ziv-Welch algorithm.
External links
- United States Patent 4,558,302 (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=4,558,302.WKU.&OS=PN/4,558,302&RS=PN/4,558,302)
- "LZW Data Compression", by Mark Nelson (http://www.dogma.net/markn/articles/lzw/lzw.htm) (DDJ Article with source code)
- Sad day... GIF patent dead at 20 (http://www.kyz.uklinux.net/giflzw.php) (shortened and maybe too simplified story, more detailed stores are found at the GIF page)
de:Lempel-Ziv-Welch-Algorithmus fr:Lempel-Ziv-Welch nl:Lempel Ziv Welch ja:LZW pl:LZW