Lempel-Ziv-Welch

related topics
{style, bgcolor, rowspan}
{math, number, function}
{system, computer, user}
{language, word, form}
{@card@, make, design}
{work, book, publish}

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is designed to be fast to implement but is not usually optimal because it performs only limited analysis of the data.

Contents

Algorithm

Idea

The scenario described in Welch's 1984 paper[1] encodes sequences of 8-bit data as fixed-length 12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence for which there is no code yet in the dictionary. The code for the sequence (without that character) is emitted, and a new code (for the sequence with that character) is added to the dictionary.

The idea was quickly adapted to other situations. In an image based on a color table, for example, the natural character alphabet is the set of color table indexes, and in the 1980s, many images had small color tables (on the order of 16 colors). For such a reduced alphabet, the full 12-bit codes yielded poor compression unless the image was large, so the idea of a variable-width code was introduced: codes typically start one bit wider than the symbols being encoded, and as each code size is used up, the code width increases by 1 bit, up to some prescribed maximum (typically 12 bits).

Further refinements include reserving a code to indicate that the code table should be cleared (a "clear code", typically the first value immediately after the values for the individual alphabet characters), and a code to indicate the end of data (a "stop code", typically one greater than the clear code). The clear code allows the table to be reinitialized after it fills up, which lets the encoding adapt to changing patterns in the input data. Smart encoders can monitor the compression efficiency and clear the table whenever the existing table no longer matches the input well.

Full article ▸

related documents
Table of mathematical symbols
Wikipedia:Parser bug reports
Wikipedia:Special pages bug reports
Periodic table (extended)
Wikipedia:Traffic
Departments of France
List of saints
Duodecimal
United States congressional delegations from Alabama
List of rapid transit systems
1980
Riccardo Patrese
Atomic radius
List of Welsh language poets (6th century to c.1600)
Wikipedia:Stub
Mika Häkkinen
Prime Minister of New Zealand
List of elements by atomic number
Colleges of the University of Cambridge
Alain Prost
Woodridge, Illinois
Jefferson County, Kentucky
Savoy opera
Help:Interlanguage links
Wikipedia:Feature request (archive)
Wikipedia:Size comparisons
Boy Meets World
Pomeranian Voivodeship
List of counties in Colorado
BBC World Service