UTF-16/UCS-2

related topics
{math, number, function}
{system, computer, user}
{language, word, form}
{album, band, music}
{island, water, area}

UTF-16 (16-bit Unicode Transformation Format) is a character encoding for Unicode capable of encoding 1,112,064 numbers (called code points) in the Unicode code space from 0 to 0x10FFFF. It produces a variable-length result of either one or two 16-bit code units per code point.

The older UCS-2 (2-byte Universal Character Set) is a similar character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.[1]. It produces a fixed-length format by simply using the code point as the 16-bit code unit and produces exactly the same result as UTF-16 for 63,488 code points in the range 0-0xFFFF, including all characters that had been assigned a value in this range at that time.

UTF-16 is officially defined in Annex Q of the international standard ISO/IEC 10646-1. It is also described in "The Unicode Standard" version 2.0 and higher, as well as in the IETF's RFC 2781.

Contents

Description

Code points U+0000..U+D7FF and U+E000..U+FFFF

For these code points both UTF-16 and UCS-2 use a single 16-bit code value that is equal (numerically) to the code point. This group of code points is named the Basic Multilingual Plane or BMP.

Code points U+10000..U+10FFFF

Code points larger than 0xFFFF are called supplementary code points or the Supplementary Planes

It is not possible to encode these code points in UCS-2.

UTF-16 converts these into two 16-bit code points, called a surrogate pair, by the following scheme:

  • 0x10000 is subtracted from the code point, leaving a 20 bit number in the range 0..0xFFFFF.
  • The top ten bits (a number in the range 0..0x3FF) are added to 0xD800 to give the first code point or high surrogate, which will be in the range 0xD800..0xDBFF.
  • The low ten bits (also in the range 0..0x3FF) are added to 0xDC00 to give the second code point or low surrogate, which will be in the range 0xDC00..0xDFFF.

Full article ▸

related documents
Mathematica
Augmented Backus–Naur Form
Information retrieval
Diffie-Hellman key exchange
Gram–Schmidt process
Open set
Recursive descent parser
Compactification (mathematics)
Depth-first search
Octal
Riesz representation theorem
Procedural programming
Closure (topology)
Fixed point combinator
Multiplicative function
Commutator subgroup
Paracompact space
Arity
Linear search
Prim's algorithm
Jules Richard
Existential quantification
Legendre polynomials
Hyperbolic function
Bilinear transform
Riemann mapping theorem
Constructible number
Decimal
Rank (linear algebra)
Union (set theory)