UTF-16

The first 216 Unicode code points. The stripe of solid gray near the bottom are the surrogate halves used by UTF-16 (the white region below the stripe is the Private Use Area)

Character encoding capable of encoding all 1,112,064 valid character code points of Unicode (in fact this number of code points is dictated by the design of UTF-16).

- UTF-16
The first 216 Unicode code points. The stripe of solid gray near the bottom are the surrogate halves used by UTF-16 (the white region below the stripe is the Private Use Area)

120 related topics

Relevance

A human computer, with microscope and calculator, 1952

Universal Coded Character Set

Standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

Standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

A human computer, with microscope and calculator, 1952

The original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP.

Logo of the Unicode Consortium

Unicode

Information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

Information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

Logo of the Unicode Consortium
Many modern applications can render a substantial subset of the many scripts in Unicode, as demonstrated by this screenshot from the OpenOffice.org application.
15px
Various Cyrillic characters shown with upright, oblique and italic alternate forms

The Unicode standard defines Unicode Transformation Formats (UTF): UTF-8, UTF-16, and UTF-32, and several other encodings.

A map of the Basic Multilingual Plane. Each numbered box represents 256 code points.

Plane (Unicode)

Continuous group of 65,536 code points.

Continuous group of 65,536 code points.

A map of the Basic Multilingual Plane. Each numbered box represents 256 code points.
A map of the Supplementary Multilingual Plane. Each numbered box represents 256 code points.
A map of the Supplementary Ideographic Plane. Each numbered box represents 256 code points.
A map of the Tertiary Ideographic Plane. Each numbered box represents 256 code points.
A map of the Supplementary Special-purpose Plane. Each numbered box represents 256 code points.

The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word.

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".

Character encoding

Process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers.

Process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers.

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".
Hollerith 80-column punch card with EBCDIC character set
365x365px

A code unit in UTF-16 consists of 16 bits;

Declared character set for 10million most popular websites since 2010

UTF-8

Variable-width character encoding used for electronic communication.

Variable-width character encoding used for electronic communication.

Declared character set for 10million most popular websites since 2010
Use of the main encodings on the web from 2001 to 2012 as recorded by Google, with UTF-8 overtaking all others in 2008 and over 60% of the web in 2012 (since then approaching 100%). The ASCII-only figure includes all web pages that only contain ASCII characters, regardless of the declared header. Other encodings of Unicode such as GB2312 are added to "others".

Since RFC 3629 (November 2003), the high and low surrogate halves used by UTF-16 (U+D800 through U+DFFF) and code points not encodable by UTF-16 (those after U+10FFFF) are not legal Unicode values, and their UTF-8 encoding must be treated as an invalid byte sequence.

Text file of The Human Side of Animals by Royal Dixon, displayed by the command in an xterm window

Plain text

Loose term for data that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.).

Loose term for data that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.).

Text file of The Human Side of Animals by Royal Dixon, displayed by the command in an xterm window

As Unicode-based encodings such as UTF-8 and UTF-16 become more common, that usage may be shrinking.

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".

CCSID

16-bit number that represents a particular encoding of a specific code page.

16-bit number that represents a particular encoding of a specific code page.

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".

For example, Unicode is a code page that has several encoding (so called "transformation") forms, like UTF-8, UTF-16 and UTF-32, but which may or may not actually be accompanied by a CCSID number to indicate that this encoding is being used.

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".

Variable-width encoding

Type of character encoding scheme in which codes of differing lengths are used to encode a character set for representation, usually in a computer.

Type of character encoding scheme in which codes of differing lengths are used to encode a character set for representation, usually in a computer.

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".

The Unicode standard has two variable-width encodings: UTF-8 and UTF-16 (it also has a fixed-width encoding, UTF-32).

PHP

General-purpose scripting language geared toward web development.

General-purpose scripting language geared toward web development.

This is an example of PHP code for the WordPress content management system.
The elePHPant, PHP mascot
A "Hello World" application in PHP 7.4 running on its built-in development server
Example output of the phpinfo function in PHP 7.1
A broad overview of the LAMP software bundle, displayed here together with Squid
Dynamic web page: example of server-side scripting (PHP and MySQL)

In 2005, a project headed by Andrei Zmievski was initiated to bring native Unicode support throughout PHP, by embedding the International Components for Unicode (ICU) library, and representing text strings as UTF-16 internally.

Main Menu of IBM i 7.1, shown inside a TN5250 client

IBM i

Operating system developed by IBM for IBM Power Systems.

Operating system developed by IBM for IBM Power Systems.

Main Menu of IBM i 7.1, shown inside a TN5250 client
Main Menu of IBM i 7.1, shown inside a TN5250 client
IBM i5/OS logo
Original IBM i logo
Diagram showing the architectural layers of the IBM i operating system, and their relationship to hardware and user applications
IBM i during initial program load of the SLIC
Main Menu of SSP 7.5, running on top of the Advanced 36 Machine environment

IBM i uses EBCDIC as the default character encoding, but also provides support for ASCII, UCS-2 and UTF-16.