Prefix code

related topics
{math, number, function}
{language, word, form}
{system, computer, user}
{household, population, family}
{work, book, publish}

A prefix code is a code system, typically a variable-length code, with the "prefix property": there is no valid code word in the system that is a prefix (start) of any other valid code word in the set. A code with code words {9, 59, 55} has the prefix property; a code consisting of {9, 5, 59, 55} does not, because "5" is a prefix of both "59" and "55". With a prefix code, a receiver can identify each word without requiring a special marker between words.

Prefix codes are also known as prefix-free codes, prefix condition codes and instantaneous codes. Although Huffman coding is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm. The term comma-free code is sometimes also applied as a synonym for prefix-free codes[1][2] but in most mathematical books and articles (e. g. [3][4]) it is used to mean self-synchronizing codes, a subclass of prefix codes.

Using prefix codes, a message can be transmitted as a sequence of concatenated code words, without any out-of-band markers to frame the words in the message. The recipient can decode the message unambiguously, by repeatedly finding and removing prefixes that form valid code words. This is not possible with codes that lack the prefix property, for example {0, 1, 10, 11}: a receiver reading a "1" at the start of a code word would not know whether that was the complete code word "1", or merely the prefix of the code word "10" or "11".

The variable-length Huffman codes, country calling codes, the country and publisher parts of ISBNs, and the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless Standard are prefix codes.

Prefix codes are not error-correcting codes. In practice, a message might first be compressed with a prefix code, and then encoded again with channel coding (including error correction) before transmission.

Kraft's inequality characterizes the sets of code word lengths that are possible in a prefix code.



Techniques for constructing a prefix code can be simple, or quite complicated.

If every word in the code has the same length, the code is called a fixed-length code, or a block code (though the term block code is also used for fixed-size error-correcting codes in channel coding). For example, ISO 8859-15 letters are always 8 bits long. UTF-32/UCS-4 letters are always 32 bits long. ATM packets are always 424 bits long. A block code of fixed length k bits can encode up to 2k source symbols.

Full article ▸

related documents
Context-sensitive grammar
Unary numeral system
Backus–Naur Form
Dublin Core
History of large numbers
Malleability (cryptography)
Commutative ring
Calculus with polynomials
Byte-order mark
Hash collision
GNU Octave
Weierstrass–Casorati theorem
Borel-Cantelli lemma
Residue (complex analysis)
Nowhere dense set
Category (mathematics)
Magma computer algebra system
Stirling number
Lex programming tool
Zeta distribution
Dyadic rational
Degenerate distribution
Bernoulli process
Search engine (computing)
Pseudometric space
Cayley's theorem