LZMA
|
LZMA, short for Lempel-Ziv-Markov chain-Algorithm, is a data compression algorithm developed until 2001 and used in the 7z format of the 7-Zip archiver and by StuffitX. It uses a dictionary compression scheme somewhat similar to LZ77 and features a high compression ratio (generally higher than Bzip2) and a variable compression-dictionary size (up to 4 GB). (see also LZW)
Contents |
Overview
The open source (written in C++) LZMA compression library uses an improved LZ77 compression algorithm, as well as specific preprocessing routines for binaries. It uses some entropy coding.
Streams for data, repeated-sequence size and repeated-sequence location seem to be compressed separately.
Other used concepts include hash chains, binary trees and Patricia tries.
BCJ / BCJ2 binary file compression
The LZMA SDK comes with the BCJ / BCJ2 compressor included: For x86, ARM, PowerPC (PPC), IA64 and ARMThumb processors, jump targets are normalized before compression. For x86, this means that near jumps, calls and conditional jumps (but not short jumps and conditional jumps) are converted from the machine language "jump 1655 bytes backward" style notation to normalized "jump to address 5554" style notation.
While 7-Zip BCJ2 assumes 32 bit displacements (addresses), for example the UPX executable file compressor can also use 16 bit values when it detects 16 bit DOS binary file formats. The RAR compressor uses displacement compression for 32 bit x86 executables and IA64 Itanium executables.
The difference between BCJ and BCJ2 is that the former only translates near jump / call targets to their normalized form, BCJ2 compresses (x86 only) near jump, near call and conditional near jump targets separately.
7-Zip implementation
The reference implementation, which is available under the GNU LGPL license, has the following properties:
- Compression speed: approximately 1 MB per second on a 2 GHz CPU
- Decompression speed: between 10 and 20 MB per second on a 2 GHz CPU
- Support for multi-threading and for the Pentium 4 microprocessor's hyper-threading feature
The decompression code for LZMA is around 5KB and the dynamic memory needed during decompression is modest (it depends on the dictionary size). These features make the decompression phase of the algorithm well-suited to embedded applications.
The use of Microsoft Windows specific features is deeply buried in the source code, which makes it very difficult to create a Unix-compatible version. However, there are two working ports to Unix-like platforms: p7zip (http://sourceforge.net/projects/p7zip/) is a more-or-less complete port of the 7z and 7za command-line versions of 7-zip for POSIX systems like Unix (Linux, Solaris, OpenBSD, FreeBSD, Cygwin, ...), MacOS X and BeOS. LZMA Unix Port (http://martinus.geekisp.com/rublog.cgi/Projects/LZMA) is a port of only the LZMA code to create a stream based compression utility similar to gzip. This tool is not an archiving utility and so its format is a plain one (and not equivalent to a raw LZMA stream from 7-zip due a missing UInt64 specifying uncompressed filesize at the end of the header). 7-zip uses a more flexible archive format, 7z, and thus neither tool can use the files the other creates, at least for now.
There is a Mac OS X port of 7zip called Compress (not related to the old archiving format), but it is buggy at best.
The PyLZMA Python Wrapper (http://www.joachim-bauch.de/projects/python/pylzma) supports compression and decompression on the Windows and Linux platforms.
Some embedded router-dsl-wireless devices (like the US Robotics 9105 and 9106) run a modified version of Linux (source code available on USR website (http://www.usr.com/support/s-gpl-code.asp), apparently the source comes from Broadcom) which boots on a filesystem which is basically Cramfs, modified to use LZMA compression instead of ZLIB. They seem to use a thick layer of glue code around the reference decompression code (it's a read-only filesystem like ISO9660, the standard compact disc filesystem). Modified cramfs tools (http://babel.ls.fi.upm.es/~aacosta/twiki/bin/view/Projects/CramfsPatches) are available to deal with such LZMA CRAMFS filesystem images.
External links
- 7-Zip Official Web-Site (http://www.7-zip.org/)
- LZMA SDK (http://www.7-zip.org/sdk.html)
- p7zip Unix port of command-line utilities (http://sourceforge.net/projects/p7zip/)
- LZMA Unix Port (http://martinus.geekisp.com/rublog.cgi/Projects/LZMA)
- PyLZMA Python Wrapper (http://www.joachim-bauch.de/projects/python/pylzma)
- Nullsoft Installer uses lzma (http://nsis.sourceforge.net/)
- Inno Setup supports lzma (http://www.jrsoftware.org/isinfo.php/)
- Compress home page (http://www.fromconcentratesoftware.com/Compress/)
- LZMA support for cramfs filesystem (http://babel.ls.fi.upm.es/~aacosta/twiki/bin/view/Projects/CramfsPatches)
- Ultimate Packer for eXecutables (http://upx.sourceforge.net/)