Huffman coding

HuffmanHuffman codeHuffman encodingHuffman treeHuffman algorithmhuffman codedHuffman entropy codingLength-limited Huffman codeVariable-Length DecodingVariable-Length Decoding (VLD)
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.wikipedia
176 Related Articles

Prefix code

prefix-free codeprefix codesprefix-free
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.
Although Huffman coding is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm.

Lossless compression

losslesslossless data compressioncompression
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.
The primary encoding algorithms used to produce bit sequences are Huffman coding (also used by DEFLATE) and arithmetic coding.

David A. Huffman

Huffman, David A.
The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".
David Albert Huffman (August 9, 1925 – October 7, 1999) was an American pioneer in computer science, known for his Huffman coding.

Variable-length code

VLDuniquely decodablevariable length coding
The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source symbol (such as a character in a file).
Some examples of well-known variable-length coding strategies are Huffman coding, Lempel–Ziv coding, arithmetic coding, and context-adaptive variable-length coding.

Entropy encoding

entropy codingentropy codedentropy coder
As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols.
Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding.

Binary tree

complete binary treebinary treesperfect binary tree
Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.
Common examples occur with Huffman coding and cladograms.

Shannon–Fano coding

Shannon-Fano codingShannon-FanoShannon-Fano codes
Building the tree from the bottom up guaranteed optimality, unlike top-down Shannon-Fano coding.
It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding.

Canonical Huffman code

canonical encodingcanonical Huffman tables
If the data is compressed using canonical encoding, the compression model can be precisely reconstructed with just B2^B bits of information (where B is the number of bits per symbol).
A canonical Huffman code is a particular type of Huffman code with unique properties which allow it to be described in a very compact manner.

Arithmetic coding

arithmetic coderarithmetic encodingarithmetic code
Other methods such as arithmetic coding often have better compression capability.
Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, an arbitrary-precision fraction q where 0.0 ≤ q < 1.0.

Entropy (information theory)

entropyinformation entropyShannon entropy
We will not verify that it minimizes L over all codes, but we will compute L and compare it to the Shannon entropy H of the given set of weights; the result is nearly optimal.
Entropy effectively bounds the performance of the strongest lossless compression possible, which can be realized in theory by using the typical set or in practice using Huffman, Lempel–Ziv or arithmetic coding.

Package-merge algorithm

The coin collector's problem
The package-merge algorithm solves this problem with a simple greedy approach very similar to that used by Huffman's algorithm.
The package-merge algorithm is an O(nL)-time algorithm for finding an optimal length-limited Huffman code for a given distribution on a given alphabet of size n, where no code word is longer than L.

Adaptive Huffman coding

adaptive Huffman codesalgorithm FGKalgorithm V
A variation called adaptive Huffman coding involves calculating the probabilities dynamically based on recent actual frequencies in the sequence of source symbols, and changing the coding tree structure to match the updated probability estimates.
Adaptive Huffman coding (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding.

Garsia–Wachs algorithm

A later method, the Garsia–Wachs algorithm of Adriano Garsia and Michelle L. Wachs (1977), uses simpler logic to perform the same comparisons in the same total time bound.
The Garsia–Wachs algorithm is an efficient method for computers to construct optimal binary search trees and alphabetic Huffman codes, in linearithmic time.

DEFLATE

deflateddeflate-decodeDeflate64
DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization followed by the use of prefix codes; these are often called "Huffman codes" even though most applications use pre-defined variable-length codes rather than codes designed using Huffman's algorithm.
In computing, Deflate is a lossless data compression file format that uses a combination of LZSS and Huffman coding.

Priority queue

priority queuingPriority queuesmin-priority queue
The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority:
Huffman coding requires one to repeatedly obtain the two lowest-frequency trees.

Golomb coding

Rice codingGolomb codeGolomb Rice code
For the simple case of Bernoulli processes, Golomb coding is optimal among prefix codes for coding run length, a fact proved via the techniques of Huffman coding.
The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it were possible to compute the Huffman code.

JPEG

JPG.jpgJPG/JPEG
DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization followed by the use of prefix codes; these are often called "Huffman codes" even though most applications use pre-defined variable-length codes rather than codes designed using Huffman's algorithm.
The specification also cites a 1984 paper by Wen-Hsiung Chen and W.K. Pratt as an influence on its quantization algorithm, and David A. Huffman's 1952 paper for its Huffman coding algorithm.

Modified Huffman coding

Modified Huffman
A similar approach is taken by fax machines using modified Huffman coding.
It combines the variable length codes of Huffman coding with the coding of repetitive data in run-length encoding.

Greedy algorithm

greedygreedilygreedy heuristic
The package-merge algorithm solves this problem with a simple greedy approach very similar to that used by Huffman's algorithm.
Examples of such greedy algorithms are Kruskal's algorithm and Prim's algorithm for finding minimum spanning trees, and the algorithm for finding optimum Huffman trees.

Alan Tucker

Alan C. TuckerTucker, Alan
This is also known as the Hu–Tucker problem, after T. C. Hu and Alan Tucker, the authors of the paper presenting the first O(n\log n)-time solution to this optimal binary alphabetic problem, which has some similarities to Huffman algorithm, but is not a variation of this algorithm.
*Hu-Tucker coding

Binary search tree

binary search treesBSTsearch tree
These optimal alphabetic binary trees are often used as binary search trees.
Such a tree might be compared with Huffman trees, which similarly seek to place frequently used items near the root in order to produce a dense information encoding; however, Huffman trees store data elements only in leaves, and these elements need not be ordered.

Computer science

computer scientistcomputer sciencescomputer scientists
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.

Information theory

information-theoreticinformation theoristinformation
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.

Doctor of Science

DScD.Sc.Sc.D.
The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

Massachusetts Institute of Technology

MITM.I.T.Massachusetts Institute of Technology (MIT)
The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".