UTF-16

UTF-16BEUTF-16LEsurrogate pairUTF-16/UCS-212001201surrogate pairs16-bit Unicode16-bit Unicode character encoding scheme16-bit Unicode Transfer Format
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode.wikipedia
181 Related Articles

Comparison of Unicode encodings

UTF-6UTF-5encodings
The encoding is variable-length, as code points are encoded with one or two 16-bit code units (also see comparison of Unicode encodings for a comparison of UTF-8, -16 & -32).
UTF-16 and UTF-32 are incompatible with ASCII files, and thus require Unicode-aware programs to display, print and manipulate them, even if the file is known to contain only characters in the ASCII subset.

Character encoding

character setComputer encodingsencoding
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode.
Characters in the range U+10000 to U+10FFFF in the other planes are called supplementary characters.

Plain text

textplain-texttexts
It is also often used for plain text and for word-processing data files on Windows.
As Unicode-based encodings such as UTF-8 and UTF-16 become more common, that usage may be shrinking.

UTF-8

65001Unicode (UTF-8)AL32UTF8
The encoding is variable-length, as code points are encoded with one or two 16-bit code units (also see comparison of Unicode encodings for a comparison of UTF-8, -16 & -32). It never gained popularity on the web, where UTF-8 is dominant (and considered "the mandatory encoding for all [text]" by WHATWG ).
The red cells in the F row (F5 to FD) indicate leading bytes of 4-byte or longer sequences that cannot be valid because they would encode code points larger than the U+10FFFF limit of Unicode (a limit derived from the maximum code point encodable in UTF-16), and FE and FF were never defined for any purpose in UTF-8.

UTF-32

UCS-4UTF-32BEUTF-32LE
The encoding is variable-length, as code points are encoded with one or two 16-bit code units (also see comparison of Unicode encodings for a comparison of UTF-8, -16 & -32).
This makes UTF-32 close to twice the size of UTF-16.

Variable-width encoding

MBCSmulti-bytemulti-byte character set
The encoding is variable-length, as code points are encoded with one or two 16-bit code units (also see comparison of Unicode encodings for a comparison of UTF-8, -16 & -32).
The Unicode standard has two variable-width encodings: UTF-8 and UTF-16 (it also has a fixed-width encoding, UTF-32).

Unicode in Microsoft Windows

IsTextUnicodeUnicode
UTF-16 is used for text in the OS API of all currently supported versions of Microsoft Windows (and including at least all since Windows CE/2000/XP/2003/Vista/7 ) including Windows 10 (while since insider build 17035 and the April 2018 update, it has improved UTF-8 support in addition to UTF-16; see Unicode in Microsoft Windows#UTF-8).
Microsoft was one of the first companies to implement Unicode (back then UCS-2, which evolved into UTF-16) in their products.

Byte order mark

BOMU+FEFF
To assist in recognizing the byte order of code units, UTF-16 allows a Byte Order Mark (BOM), a code point with the value U+FEFF, to precede the first actual coded value.
In UTF-16, a BOM may be placed as the first character of a file or character stream to indicate the endianness (byte order) of all the 16-bit code units of the file or stream.

CCSID

The IBM i operating system designates CCSID (code page) 13488 for UCS-2 encoding and CCSID 1200 for UTF-16 encoding, though the system treats them both as UTF-16.
For example, Unicode is a code page that has several encoding forms, like UTF-8, UTF-16 and UTF-32.

GSM 03.38

3GPP 23.0383GPP TS 23.038GSM 7 bit default alphabet
iPhone handsets use UTF-16 for Short Message Service instead of UCS-2 described in the 3GPP TS 23.038 (GSM) and IS-637 (CDMA) standards.
However, since modern programming environments do not provide encoders or decoders for UCS-2, some cell phones (e.g. iPhones) use UTF-16 instead of UCS-2.

Plane (Unicode)

Basic Multilingual PlaneSupplementary Multilingual PlaneBMP
UCS-2 differs from UTF-16 by being a constant length encoding and only capable of encoding characters of BMP.
The limit of 17 planes is due to UTF-16, which can encode 2 20 code points (16 planes) as pairs of words, plus the BMP as a single word.

Unicode

Unicode StandardUnicode Transformation FormatThe Unicode Standard
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode.
The Unicode standard defines UTF-8, UTF-16, and UTF-32, and several other encodings are in use.

Joliet (file system)

JolietJoliet file systemJoliet ("CDFS")
The Joliet file system, used in CD-ROM media, encodes file names using UCS-2BE (up to sixty-four Unicode characters per file name).
Joliet accomplishes this by supplying an additional set of filenames that are encoded in UCS-2BE (UTF-16BE in practice since Windows 2000).

Code page

codepagecode pagesOEM character set
The IBM i operating system designates CCSID (code page) 13488 for UCS-2 encoding and CCSID 1200 for UTF-16 encoding, though the system treats them both as UTF-16.

Universal Coded Character Set

ISO 10646Universal Character SetISO/IEC 10646
UTF-16 arose from an earlier fixed-width 16-bit encoding known as UCS-2 (for 2-byte Universal Character Set) once it became clear that more than 2 16 code points were needed.
The first amendment to the original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP.

International Components for Unicode

ICU
In Java 7 regular expressions, ICU, and Perl, the syntax must be used; similarly, in ECMAScript 2015 (JavaScript), the escape format is.
ICU has historically used UTF-16, and still does only for Java; while for C/C++ UTF-8 is supported, including the correct handling of "illegal UTF-8".

Bit

bitsbinary digitbinary digits
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode.

Code point

codepointcode pointscharacter codes
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode.

JavaScript

Server-side JavaScriptclient-side JavaScriptJS
UTF-16 is used internally by systems such as Windows, Java and JavaScript.

MacOS

Mac OS XOS XMac
It is rarely used for files on Unix/Linux or macOS.

WHATWG

Web Hypertext Application Technology Working GroupHTML5 working groupThe Web Hypertext Application Technology Working Group
It never gained popularity on the web, where UTF-8 is dominant (and considered "the mandatory encoding for all [text]" by WHATWG ).

ISO/IEC JTC 1/SC 2

ISO/IEC JTC1 SC2SC 2WG2
Two groups worked on this in parallel, ISO/IEC JTC 1/SC 2 and the Unicode Consortium, the latter representing mostly manufacturers of computing equipment.

Unicode Consortium

Unicode Technical CommitteeThe Unicode ConsortiumUnicode
Two groups worked on this in parallel, ISO/IEC JTC 1/SC 2 and the Unicode Consortium, the latter representing mostly manufacturers of computing equipment.

Internet Engineering Task Force

IETFInternet Engineering Task Force (IETF)IETF Working Group
The UTF-16 encoding scheme was developed as a compromise to resolve this impasse in version 2.0 of the Unicode standard in July 1996 and is fully specified in RFC 2781 published in 2000 by the IETF.

Emoji

AnimojiemojisEmoji characters
As of Unicode 9.0, some modern non-Latin Asian, Middle-Eastern, and African scripts fall outside this range, as do most emoji characters.