ISO/IEC 2022

ISO 2022ISO-2022-JPISO-2022ISO/IEC 4873ISO 2022-JPISO 4873ECMA-35ISO-2022-KRalternative character setsECMA-43
ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISO standard (equivalent to the ECMA standard ECMA-35 ) specifying Many of the character sets included as ISO/IEC 2022 encodings are 'double byte' encodings where two bytes correspond to a single character.wikipedia
128 Related Articles

Ecma International

ECMAEuropean Computer Manufacturers AssociationEuropean Computer Manufacturer's Association
A subset of ISO 2022 applied to 8-bit single-byte encodings is defined by ISO/IEC 4873, also published by Ecma International as ECMA-43.

Character encoding

character setComputer encodingsencoding
Extended Unix Code (EUC) is an 8-bit variable-width character encoding system used primarily for Japanese, Korean, and simplified Chinese.
Simple character encoding schemes include UTF-8, UTF-16BE, UTF-32BE, UTF-16LE or UTF-32LE; compound character encoding schemes, such as UTF-16, UTF-32 and ISO/IEC 2022, switch between several simple schemes by using byte order marks or escape sequences; compressing schemes try to minimise the number of bytes used per code unit (such as SCSU, BOCU, and Punycode).

JIS X 0208

JIS C 6226-1978JIS952
Encoding byte values ("bit combinations") are often given in column-line notation, where two decimal numbers in the range 00–15 (each corresponding to a single hexadecimal digit) are separated by a slash.
ASCII and JISCII punctuation (shown here with a heavy green border) may use alternative mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as Shift JIS, EUC-JP or ISO 2022-JP.

C0 and C1 control codes

C1 control codeC0 control charactersISO 6429
ISO 2022 / ECMA-35 also recognizes the use of the backspace and carriage return control characters as means of combining otherwise spacing characters, as well as the CSI sequence "Graphic Character Combination" (GCC).
The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, they are rarely used.

ISO/IEC 10367

ISO 10367
Registration of a set as a 96-character set does not necessarily mean that the 0x20/A0 and 0x7F/FF bytes are actually assigned by the set; some examples of graphical character sets which are registered as 96-sets but do not use those bytes include the G1 set of I.S. 434, the box drawing set from ISO/IEC 10367, and ISO-IR-164 (a subset of the G1 set of ISO-8859-8 with only the letters, used by CCITT).
ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2, defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873 (as opposed to ISO/IEC 8859, which defines character encodings at level 1 of ISO/IEC 4873).


TBCSdouble-byteDouble-Byte Character Set
Written East Asian languages, specifically Chinese, Japanese, and Korean, use far more characters than can be represented in an 8-bit computer byte and were first represented on computers with language-specific double byte encodings.
Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022.

KS X 1001

KS X 1001:1998KSKS C 5601-1987
KS X 1001 is arranged as a 94×94 table, following the structure of 2-byte code words in ISO 2022 and EUC.

GB 2312

GB2312GBGB 2312-80
For the two-byte character sets, the code point of each character is normally specified in so-called kuten (Japanese: 区点) form (sometimes called qūwèi (Chinese: 区位), especially when dealing with GB2312 and related standards), which specifies a zone (区, Japanese: ku, Chinese: qū), and the point (Japanese: 点 ten) or position (Chinese: 位 wèi) of that character within the zone. They support the character sets GB 2312 (for simplified Chinese) and CNS 11643 (for traditional Chinese).
Characters in GB2312 are arranged in a 94x94 grid (as in ISO 2022), and the two-byte code point of each character is expressed in the kuten (or quwei) form, which specifies a row (ku or qu) and the position of the character within the row (cell, ten or wei).


ISO 646ECMA-6IR-10
To represent large character sets, ISO/IEC 2022 builds on ISO/IEC 646's property that one seven bit character will normally define 94 graphic (printable) characters (in addition to space and 33 control characters).
Like NRCS and ISO 646, within the Latin variants, the family of encodings known as the G0 set are based on a similar invariant subset of ASCII, but do not retain either nor as invariant.

Japanese language and computers

characters from Japanese scriptscomputer inputencodings for Japanese
Until 2000s, most Japanese emails were in ISO-2022-JP ("JIS encoding") and web pages in Shift-JIS and mobile phones in Japan usually used some form of Extended Unix Code.

ARIB STD B24 character set

As with other escape sequence types, the range 0x30–0x3F is reserved for private-use F bytes (which might be defined by further protocols such as ARIB STD-B24).
The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):


Big-5Big 5Big5 encoding
The structure of Big5 does not conform to the ISO 2022 standard, but rather bears a certain similarity to the Shift JIS encoding.

ANSI escape code

ISO 2022 / ECMA-35 also recognizes the use of the backspace and carriage return control characters as means of combining otherwise spacing characters, as well as the CSI sequence "Graphic Character Combination" (GCC).


A sequence is also defined for returning to ISO/IEC 2022; the registrations which support this sequence as encoded in ISO/IEC 2022 comprise (as of 2019) various Videotex formats, UTF-8, and UTF-1.
This design with 66 protected characters tried to be ISO 2022 compatible.

JIS X 0213

JIS X 0213:2004
JIS X 0213 defines several 7-bit and 8-bit encodings including EUC-JIS-2004, ISO-2022-JP-2004 and Shift JIS-2004.

HZ (character encoding)

HZHZ-GB-2312HZ code
Notably, the WHATWG Encoding Standard used by HTML5 maps ISO-2022-KR, ISO-2022-CN and ISO-2022-CN-EXT (as well as HZ-GB-2312) to the "replacement" decoder, which maps all input to the replacement character, in order to prevent certain cross-site scripting and related attacks, which utilize a difference in encoding support between the client and server.
Therefore, in lieu of standard ISO 2022 escape sequences (as in the case of ISO-2022-JP) or 8-bit characters (as in the case of EUC), the HZ code uses only printable, 7-bit characters to represent Chinese characters.

JIS X 0201

C 6220:1969-roJapaneseJIS C 6220
Use of to switch to the JIS X 0201-1976 Kana set (1 byte per character) is not part of the ISO-2022-JP profile, but is also sometimes used.
The basic ISO-2022-JP profile does not permit the Kana set of JIS X 0201, only the Roman set and JIS X 0208 (although ISO 2022 / JIS X 0202 itself permits it).


Registration of control functions to type "Fs" sequences must be approved by ISO/IEC JTC 1/SC 2.

CNS 11643

Chinese Standard Interchange CodeCNS 11643-1992CNS character set
They support the character sets GB 2312 (for simplified Chinese) and CNS 11643 (for traditional Chinese).
CNS 11643 is a superset of ASCII designed to conform to ISO 2022.


Unicode StandardUnicode Transformation FormatThe Unicode Standard
Although ISO/IEC 2022 character sets using control sequences are still in common use, particularly ISO-2022-JP, most modern e-mail applications are converting to use the simpler Unicode transforms such as UTF-8.
Some East Asian text is still encoded in encodings such as ISO-2022, and some devices, such as mobile phones, still cannot correctly handle Unicode data.

Extended Unix Code

Though the standard defines it, no registered character set uses three bytes (although EUC-TW's unregistered G2 is).
The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94 characters, or 8836 (94 2 ) characters, or 830584 (94 3 ) characters, as sequences of 7-bit codes.



luit is also used to properly render the output of applications that use ISO 2022 character set switching.