Unicity distance

related topics
{math, number, function}
{system, computer, user}
{language, word, form}
{rate, high, increase}
{math, energy, light}
{war, force, army}

In cryptography, unicity distance is the length of an original ciphertext needed to break the cipher by reducing the number of possible spurious keys to zero in a brute force attack. That is, after trying every possible key, there should be just one decipherment that makes sense, i.e. expected amount of ciphertext needed to determine the key completely, assuming the underlying message has redundancy.

Consider an attack on the ciphertext string "WNAIW" encrypted using a Vigenère cipher with a five letter key. Conceivably, this string could be deciphered into any other string — RIVER and WATER are both possibilities for certain keys. This is a general rule of cryptanalysis: with no additional information it is impossible to decode this message.

Of course, even in this case, only a certain number of five letter keys will result in English words. Trying all possible keys we will not only get RIVER and WATER, but SXOOS and KHDOP as well. The number of "working" keys will likely be very much smaller than the set of all possible keys. The problem is knowing which of these "working" keys is the right one; the rest are spurious.

Contents

Relation with key size and possible plaintexts

In general, given any particular assumptions about the size of the key and the number of possible messages, there is an average ciphertext length where there is only one key (on average) that will generate a readable message. In the example above we see only upper case Roman characters, so if we assume this is the input then there are 26 possible letters for each position in the string. Likewise if we assume five-character upper case keys, there are K=265 possible keys, of which the majority will not "work".

A tremendous number of possible messages, N, can be generated using even this limited set of characters: N = 26L, where L is the length of the message. However only a smaller set of them is readable plaintext due to the rules of the language, perhaps M of them, where M is likely to be very much smaller than N. Moreover M has a one-to-one relationship with the number of keys that work, so given K possible keys, only K × (M/N) of them will "work". One of these is the correct key, the rest are spurious.

Since N is dependent on the length of the message L, whereas M is dependent on the number of keys, K, there is some L where the number of spurious keys is zero. This L is the unicity distance.

Relation with key entropy and plaintext redundancy

The unicity distance can also be defined as the minimum amount of ciphertext-only required to permit a computationally unlimited adversary to recover the unique encryption key.

The expected unicity distance is accordingly:

where U is the unicity distance, H(k) is the entropy of the key space (e.g. 128 for 2128 equiprobable keys, rather less if the key is a memorized pass-phrase).

D is defined as the plaintext redundancy in bits per character.

Now an alphabet of 32 characters can carry 5 bits of information per character (as 32 = 25). In general the number of bits of information is lg N, where N is the number of characters in the alphabet. So for English each character can convey lg 26 = 4.7 bits of information. Remember that lg is meant as the logarithm for base two in this case. See Binary logarithm for details.

Full article ▸

related documents
Haar measure
Splitting lemma
Axiom of pairing
Richard's paradox
Extended real number line
Legendre symbol
Functional analysis
Elementary group theory
Assignment problem
Meromorphic function
Ring (mathematics)
Associativity
Idempotence
Queue (data structure)
Examples of groups
B-tree
Mathematical model
Lagrange inversion theorem
Extended Backus–Naur Form
Presburger arithmetic
Preorder
Statistical independence
XSL Transformations
Referential transparency (computer science)
Oracle machine
Chain rule
Consistency
Quotient group
Merkle-Hellman
Boolean ring