Data compression

related topics
{system, computer, user}
{math, number, function}
{theory, work, human}
{rate, high, increase}
{language, word, form}
{style, bgcolor, rowspan}

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use, through use of specific encoding schemes.

In computing, data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization.

Compression is useful because it helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. On the downside, compressed data must be decompressed to be used, and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed (the option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video). The design of data compression schemes therefore involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data.


Lossless versus lossy compression

Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely without error. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be followed by the letter 'z' is very small. Another kind of compression, called lossy data compression or perceptual coding, is possible if some loss of fidelity is acceptable. Generally, a lossy data compression will be guided by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to variations in color. JPEG image compression works in part by "rounding off" some of this less-important information. Lossy data compression provides a way to obtain the best fidelity for a given amount of compression. In some cases, transparent (unnoticeable) compression is desired; in other cases, fidelity is sacrificed to reduce the amount of data as much as possible.

Full article ▸

related documents
IBM 1401
Internet Message Access Protocol
IBM System/370
IBM AIX (operating system)
Instruction set
Blue Gene
Data Link Layer
Race condition
Terminate and Stay Resident
Computer networking
File server
LAN switching
Hercules Graphics Card
Exidy Sorcerer
Signaling System 7
Video Graphics Array
Video CD
Computer display standard
Gigabit Ethernet
General Packet Radio Service
Disk image
High fidelity