The first 216 Unicode code points. The stripe of solid gray near the bottom are the surrogate halves used by UTF-16 (the white region below the stripe is the Private Use Area)
A map of the Basic Multilingual Plane. Each numbered box represents 256 code points.
A map of the Supplementary Multilingual Plane. Each numbered box represents 256 code points.
A map of the Supplementary Ideographic Plane. Each numbered box represents 256 code points.
A map of the Tertiary Ideographic Plane. Each numbered box represents 256 code points.
A map of the Supplementary Special-purpose Plane. Each numbered box represents 256 code points.

The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word.

- Plane (Unicode)

A UTF-16 stream, therefore, consists of single 16-bit code points outside the surrogate range for code points in the Basic Multilingual Plane (BMP), and pairs of 16-bit values within the surrogate range for code points above the BMP.

- UTF-16
The first 216 Unicode code points. The stripe of solid gray near the bottom are the surrogate halves used by UTF-16 (the white region below the stripe is the Private Use Area)

3 related topics

Alpha

Declared character set for 10million most popular websites since 2010

UTF-8

Variable-width character encoding used for electronic communication.

Variable-width character encoding used for electronic communication.

Declared character set for 10million most popular websites since 2010
Use of the main encodings on the web from 2001 to 2012 as recorded by Google, with UTF-8 overtaking all others in 2008 and over 60% of the web in 2012 (since then approaching 100%). The ASCII-only figure includes all web pages that only contain ASCII characters, regardless of the declared header. Other encodings of Unicode such as GB2312 are added to "others".

Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use, including most Chinese, Japanese and Korean characters.

Since RFC 3629 (November 2003), the high and low surrogate halves used by UTF-16 (U+D800 through U+DFFF) and code points not encodable by UTF-16 (those after U+10FFFF) are not legal Unicode values, and their UTF-8 encoding must be treated as an invalid byte sequence.

Logo of the Unicode Consortium

Unicode

Information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

Information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

Logo of the Unicode Consortium
Many modern applications can render a substantial subset of the many scripts in Unicode, as demonstrated by this screenshot from the OpenOffice.org application.
15px
Various Cyrillic characters shown with upright, oblique and italic alternate forms

The Unicode standard defines Unicode Transformation Formats (UTF): UTF-8, UTF-16, and UTF-32, and several other encodings.

UCS-2 uses two bytes (16 bits) for each character but can only encode the first 65,536 code points, the so-called Basic Multilingual Plane (BMP).

150px

Universal Character Set characters

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 collaborates on the Universal Character Set (UCS).

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 collaborates on the Universal Character Set (UCS).

150px
500px
Example of fraction slash use. This typeface (Apple Chancery) shows the synthesized common fraction on the left and the precomposed fraction glyph on the right as a rendering the plain text string "1 1⁄4 1¼". Depending on the text environment, the single string "1 1⁄4" might yield either result, the one on the right through substitution of the fraction sequence with the single precomposed fraction glyph.
A more elaborate example of fraction slash usage: plain text "4 221⁄225" rendered in Apple Chancery. This font supplies the text layout software with instructions to synthesize the fraction according to the Unicode rule described in this section.

The UCS can be divided in various ways, such as by plane, block, character category, or character property.

It is also not likely to be UTF-16 in little-endian byte order because 0xFE, 0xFF read as a 16-bit little endian word would be U+FFFE, which is meaningless.