A report on Character encoding

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".
Hollerith 80-column punch card with EBCDIC character set
365x365px

Process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers.

- Character encoding
Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".

67 related topics with Alpha

Overall

Logo of the Unicode Consortium

Unicode

20 links

Logo of the Unicode Consortium
Many modern applications can render a substantial subset of the many scripts in Unicode, as demonstrated by this screenshot from the OpenOffice.org application.
15px
Various Cyrillic characters shown with upright, oblique and italic alternate forms

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

ASCII chart from a pre-1972 printer manual

ASCII

15 links

ASCII chart from a pre-1972 printer manual
ASCII (1963). Control pictures of equivalent controls are shown where they exist, or a grey dot otherwise.

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication.

IBM code page numbers (CPGIDs and CCSIDs) used for CJK encodings. Microsoft use of code page numbers for CJK encodings differs, and is noted in brackets where applicable.

Code page

10 links

IBM code page numbers (CPGIDs and CCSIDs) used for CJK encodings. Microsoft use of code page numbers for CJK encodings differs, and is noted in brackets where applicable.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers.

Declared character set for 10million most popular websites since 2010

UTF-8

12 links

Declared character set for 10million most popular websites since 2010
Use of the main encodings on the web from 2001 to 2012 as recorded by Google, with UTF-8 overtaking all others in 2008 and over 60% of the web in 2012 (since then approaching 100%). The ASCII-only figure includes all web pages that only contain ASCII characters, regardless of the declared header. Other encodings of Unicode such as GB2312 are added to "others".

UTF-8 is a variable-width character encoding used for electronic communication.

The first 216 Unicode code points. The stripe of solid gray near the bottom are the surrogate halves used by UTF-16 (the white region below the stripe is the Private Use Area)

UTF-16

9 links

The first 216 Unicode code points. The stripe of solid gray near the bottom are the surrogate halves used by UTF-16 (the white region below the stripe is the Private Use Area)

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid character code points of Unicode (in fact this number of code points is dictated by the design of UTF-16).

Various ISO 2022 and other CJK encodings supported by Mozilla Firefox as of 2004. (This support has been reduced in later versions to avoid certain cross site scripting attacks.)

ISO/IEC 2022

9 links

Various ISO 2022 and other CJK encodings supported by Mozilla Firefox as of 2004. (This support has been reduced in later versions to avoid certain cross site scripting attacks.)
Relationship between ECMA-43 (ISO/IEC 4873) editions and levels, and EUC.

ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the field of character encoding.

ISO/IEC 8859-1 code page layout

ISO/IEC 8859-1

9 links

ISO/IEC 8859-1 code page layout

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987.

Code point

5 links

In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character.

ISO/IEC 8859

11 links

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings.

Universal Coded Character Set

7 links

The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.