Lempel–Ziv–Markov chain algorithm

LZMALZMA2Lempel-Ziv-Markov chain algorithmLZMA algorithmLZMA/LZMA2
The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression.wikipedia
103 Related Articles

7z

7-Zip
It has been under development since either 1996 or 1998 by Igor Pavlov and was first used in the 7z format of the 7-Zip archiver.
The latest stable version of 7-Zip and LZMA SDK is version 19.00.

Lossless compression

losslesslossless data compressioncompression
The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression.

LZ77 and LZ78

LZ77Lempel-ZivLempel–Ziv
This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio (generally higher than bzip2) and a variable compression-dictionary size (up to 4 GB), while still maintaining decompression speed similar to other commonly used compression algorithms. LZMA uses a dictionary compression algorithm (a variant of LZ77 with huge dictionary sizes and special support for repeatedly used match distances), whose output is then encoded with a range encoder, using a complex model to make a probability prediction of each bit.
These two algorithms form the basis for many variations including LZW, LZSS, LZMA and others.

Bzip2

BZIPbunzip2BZ2
This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio (generally higher than bzip2) and a variable compression-dictionary size (up to 4 GB), while still maintaining decompression speed similar to other commonly used compression algorithms.
LZMA is generally more space-efficient than bzip2 at the expense of even slower compression speed, while having much faster decompression.

XZ Utils

xzliblzmaxz-compressed
The description below is based on the compact XZ Embedded decoder by Lasse Collin included in the Linux kernel source from which the LZMA and LZMA2 algorithm details can be relatively easily deduced: thus, while citing source code as reference isn't ideal, any programmer should be able to check the claims below with a few hours of work.
XZ Utils (previously LZMA Utils) is a set of free software command-line lossless data compressors, including LZMA and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows.

7-Zip

Igor Pavlovp7zip7Zip
It has been under development since either 1996 or 1998 by Igor Pavlov and was first used in the 7z format of the 7-Zip archiver. It was originally dual-licensed under both the GNU LGPL and Common Public License, with an additional special exception for linked binaries, but was placed by Igor Pavlov in the public domain on December 2, 2008, with the release of version 4.62.
The core 7z compression uses a variety of algorithms, the most common of which are bzip2, PPMd, LZMA2, and LZMA.

Abraham Lempel

Lempel
This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio (generally higher than bzip2) and a variable compression-dictionary size (up to 4 GB), while still maintaining decompression speed similar to other commonly used compression algorithms.
The LZ77 and LZ78 algorithms authored by Lempel and Jacob Ziv have led to a number of derivative works, including the Lempel–Ziv–Welch algorithm, used in the GIF image format, and the Lempel-Ziv-Markov chain algorithm, used in the 7-Zip and xz compressors.

Markov chain

Markov processMarkov chainscontinuous-time Markov process
LZMA uses Markov chains, as implied by "M" in its name.
The LZMA lossless data compression algorithm combines Markov chains with Lempel-Ziv compression to achieve very high compression ratios.

Lzip

lzip is a free, command-line tool for the compression of data; it employs the Lempel–Ziv–Markov chain algorithm (LZMA) with a user interface that is familiar to users of usual Unix compression tools, such as gzip and bzip2.

Zip (file format)

ZIPZIP file.zip
Furthermore, compared to classic dictionary compression (such as the one used in zip and gzip formats), the dictionary sizes can be and usually are much larger, taking advantage of the large amount of memory available on modern systems.
The .ZIP File Format Specification documents the following compression methods: Store (no compression), Shrink, Reduce (levels 1-4), Implode, Deflate, Deflate64, bzip2, LZMA (EFS), WavPack, and PPMd.

WinZip

ZIPX
WinZip 12.0 (2008) added support of creating ZIP archives with lossless JPEG and LZMA compression methods; .ISO, .IMG, 7-Zip archive extractions.

Algorithm

algorithmsalgorithm designcomputer algorithm
The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression.

Dictionary coder

dictionary codingdictionarydictionary compression
This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio (generally higher than bzip2) and a variable compression-dictionary size (up to 4 GB), while still maintaining decompression speed similar to other commonly used compression algorithms.

Yaakov Ziv

Jacob ZivZiv
This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio (generally higher than bzip2) and a variable compression-dictionary size (up to 4 GB), while still maintaining decompression speed similar to other commonly used compression algorithms.

Gigabyte

GBgigabytesGiB
This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio (generally higher than bzip2) and a variable compression-dictionary size (up to 4 GB), while still maintaining decompression speed similar to other commonly used compression algorithms.

Range encoding

range encoderrange codingrange coder
LZMA uses a dictionary compression algorithm (a variant of LZ77 with huge dictionary sizes and special support for repeatedly used match distances), whose output is then encoded with a range encoder, using a complex model to make a probability prediction of each bit.

Dynamic programming

dynamicdynamic contracting problemsdynamic programming (DP),
The dictionary compressor finds matches using sophisticated dictionary data structures, and produces a stream of literal symbols and phrase references, which is encoded one bit at a time by the range encoder: many encodings are possible, and a dynamic programming algorithm is used to select an optimal one under certain approximations.

Gzip

gunzipzcat.tgz
Furthermore, compared to classic dictionary compression (such as the one used in zip and gzip formats), the dictionary sizes can be and usually are much larger, taking advantage of the large amount of memory available on modern systems.

Binary tree

complete binary treebinary treesperfect binary tree
The binary tree approach follows the hash chain approach, except that it logically uses a binary tree instead of a linked list for chaining.

Search tree

search treesbalanced search treesearch tree property
The binary tree is maintained so that it is always both a search tree relative to the suffix lexicographic ordering, and a max-heap for the dictionary position (in other words, the root is always the most recent string, and a child cannot have been added more recently than its parent): assuming all strings are lexicographically ordered, these conditions clearly uniquely determine the binary tree (this is trivially provable by induction on the size of the tree).

Radix tree

Patricia treePatricia triecompact prefix tree
Some old LZMA encoders also supported a data structure based on Patricia tries, but such support has since been dropped since it was deemed inferior to the other options.

GNU Lesser General Public License

LGPLGNU LGPLLGPLv2.1
It was originally dual-licensed under both the GNU LGPL and Common Public License, with an additional special exception for linked binaries, but was placed by Igor Pavlov in the public domain on December 2, 2008, with the release of version 4.62.

Common Public License

CPLCommon PublicCPL 1.0
It was originally dual-licensed under both the GNU LGPL and Common Public License, with an additional special exception for linked binaries, but was placed by Igor Pavlov in the public domain on December 2, 2008, with the release of version 4.62.

Public domain

public domain resourcepublic-domainPD
It was originally dual-licensed under both the GNU LGPL and Common Public License, with an additional special exception for linked binaries, but was placed by Igor Pavlov in the public domain on December 2, 2008, with the release of version 4.62.