A report on ASCII and Character encoding

ASCII chart from a pre-1972 printer manual
Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".
ASCII (1963). Control pictures of equivalent controls are shown where they exist, or a grey dot otherwise.
Hollerith 80-column punch card with EBCDIC character set
365x365px

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication.

- ASCII

Common examples of character encoding systems include Morse code, the Baudot code, the American Standard Code for Information Interchange (ASCII) and Unicode.

- Character encoding
ASCII chart from a pre-1972 printer manual

15 related topics with Alpha

Overall

Logo of the Unicode Consortium

Unicode

9 links

Logo of the Unicode Consortium
Many modern applications can render a substantial subset of the many scripts in Unicode, as demonstrated by this screenshot from the OpenOffice.org application.
15px
Various Cyrillic characters shown with upright, oblique and italic alternate forms

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

Punycode, another encoding form, enables the encoding of Unicode strings into the limited character set supported by the ASCII-based Domain Name System (DNS).

Declared character set for 10million most popular websites since 2010

UTF-8

5 links

Declared character set for 10million most popular websites since 2010
Use of the main encodings on the web from 2001 to 2012 as recorded by Google, with UTF-8 overtaking all others in 2008 and over 60% of the web in 2012 (since then approaching 100%). The ASCII-only figure includes all web pages that only contain ASCII characters, regardless of the declared header. Other encodings of Unicode such as GB2312 are added to "others".

UTF-8 is a variable-width character encoding used for electronic communication.

It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well.

ISO/IEC 8859-1 code page layout

ISO/IEC 8859-1

4 links

ISO/IEC 8859-1 code page layout

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987.

Character (computing)

3 links

Unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.

Unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.

See also Universal Character Set characters, where 8 bits are not enough to represent, while all can be represented with one or more 8-bit code units with UTF-8.

Two examples of usual encodings are ASCII and the UTF-8 encoding for Unicode.

Code point

3 links

In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character.

For example, the character encoding scheme ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex.

ISO/IEC 8859

3 links

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings.

While the bit patterns of the 95 printable ASCII characters are sufficient to exchange information in modern English, most other languages that use Latin alphabets need additional symbols not covered by ASCII.

Windows-1252

3 links

Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.

Pages declared as US-ASCII would also count as this character set.

Punched card with the Hollerith encoding of the 1964 EBCDIC character set. Contrast at the top is enhanced to show the printed characters.

EBCDIC

1 links

Punched card with the Hollerith encoding of the 1964 EBCDIC character set. Contrast at the top is enhanced to show the printed characters.

Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems.

It is an eight-bit character encoding, developed separately from the seven-bit ASCII encoding scheme.

Various ISO 2022 and other CJK encodings supported by Mozilla Firefox as of 2004. (This support has been reduced in later versions to avoid certain cross site scripting attacks.)

ISO/IEC 2022

2 links

Various ISO 2022 and other CJK encodings supported by Mozilla Firefox as of 2004. (This support has been reduced in later versions to avoid certain cross site scripting attacks.)
Relationship between ECMA-43 (ISO/IEC 4873) editions and levels, and EUC.

ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the field of character encoding.

ISO 2022 itself also defines particular control codes and escape sequences which can be used for switching between different coded character sets (for example, between ASCII and the Japanese JIS X 0208) so as to use multiple in a single document, effectively combining them into a single stateful encoding (a feature less important since the advent of Unicode).

The first 216 Unicode code points. The stripe of solid gray near the bottom are the surrogate halves used by UTF-16 (the white region below the stripe is the Private Use Area)

UTF-16

2 links

The first 216 Unicode code points. The stripe of solid gray near the bottom are the surrogate halves used by UTF-16 (the white region below the stripe is the Private Use Area)

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid character code points of Unicode (in fact this number of code points is dictated by the design of UTF-16).

UTF-16 is the only web-encoding incompatible with ASCII and never gained popularity on the web, where it is used by under 0.002% (little over 1 thousandth of 1 percent) of web pages.