UTF-8 (UCS[1] Transformation Format — 8-bit) is a multibyte character encoding for Unicode.  Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set, but unlike them, possesses the advantages of being backward-compatible with ASCII and of avoiding the complications of endianness and the resulting need to use byte order marks (BOM). For these and other reasons, UTF-8 has become the dominant character encoding for the World-Wide Web, accounting for more than half of all Web pages.[2][3]  The Internet Engineering Task Force (IETF) requires all Internet protocols to identify the encoding used for character data, and the supported character encodings must include UTF-8.[4]  The Internet Mail Consortium (IMC) recommends that all e‑mail programs be able to display and create mail using UTF-8.[5]  UTF-8 is also increasingly being used as the default character encoding in operating systems, programming languages, APIs, and software applications.

UTF-8 encodes each of the 1,112,064[6] code points in the Unicode character set using one to four 8-bit bytes (termed “octets” in the Unicode Standard).  Code points with lower numerical values (i. e., earlier code positions in the Unicode character set, which tend to occur more frequently in practice) are encoded using fewer bytes,[7] making the encoding scheme reasonably efficient.  In particular, the first 128 characters of the Unicode character set, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as the corresponding ASCII character, effectively making valid ASCII text valid UTF-8-encoded Unicode text as well.

The official IANA code for the UTF-8 character encoding is UTF-8.[8]


