
related topics 
{math, number, function} 
{system, computer, user} 
{rate, high, increase} 
{woman, child, man} 
{school, student, university} 

In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variablelength code table for encoding a source symbol (such as a character in a file) where the variablelength code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of MinimumRedundancy Codes".
Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefixfree codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. A method was later found to design a Huffman code in linear time if input probabilities (also known as weights) are sorted.^{[citation needed]}
For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm.
Although Huffman's original algorithm is optimal for a symbolbysymbol coding (i.e. a stream of unrelated symbols) with a known input probability distribution, it is not optimal when the symbolbysymbol restriction is dropped, or when the probability mass functions are unknown, not identically distributed, or not independent (e.g., "cat" is more common than "cta"). Other methods such as arithmetic coding and LZW coding often have better compression capability: both of these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input statistics, the latter of which is useful when input probabilities are not precisely known or vary significantly within the stream. However, the limitations of Huffman coding should not be overstated; it can be used adaptively, accommodating unknown, changing, or contextdependent probabilities. In the case of known independent and identicallydistributed random variables, combining symbols together reduces inefficiency in a way that approaches optimality as the number of symbols combined increases.
Contents
Full article ▸


related documents 
Gaussian elimination 
Cardinal number 
Kernel (algebra) 
Hash function 
Lua (programming language) 
Numerical analysis 
Interval (mathematics) 
Absolute value 
Sequence alignment 
Denotational semantics 
Simplex 
RSA 
Complete lattice 
Proofs of Fermat's little theorem 
Entropy (information theory) 
Infinity 
Compass and straightedge constructions 
Frame problem 
Logic programming 
Factorial 
Stone–Čech compactification 
Ruby (programming language) 
Group action 
Central limit theorem 
Prime number theorem 
Series (mathematics) 
Abelian group 
Newton's method 
Dynamic programming 
Integration by parts 
