A report on Character encoding and Windows-1252

Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".
Hollerith 80-column punch card with EBCDIC character set
365x365px

Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.

- Windows-1252

Windows-1252 for Western languages

- Character encoding
Punched tape with the word "Wikipedia" encoded in ASCII. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111".

7 related topics with Alpha

Overall

Declared character set for 10million most popular websites since 2010

UTF-8

5 links

Declared character set for 10million most popular websites since 2010
Use of the main encodings on the web from 2001 to 2012 as recorded by Google, with UTF-8 overtaking all others in 2008 and over 60% of the web in 2012 (since then approaching 100%). The ASCII-only figure includes all web pages that only contain ASCII characters, regardless of the declared header. Other encodings of Unicode such as GB2312 are added to "others".

UTF-8 is a variable-width character encoding used for electronic communication.

(The term "WTF-8" has also been used humorously to refer to erroneously doubly-encoded UTF-8 sometimes with the implication that CP1252 bytes are the only ones encoded.)

ISO/IEC 8859-1 code page layout

ISO/IEC 8859-1

4 links

ISO/IEC 8859-1 code page layout

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987.

ISO-8859-1 was (according to the standard, at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/" (HTML5 changed this to Windows-1252).

Logo of the Unicode Consortium

Unicode

4 links

Logo of the Unicode Consortium
Many modern applications can render a substantial subset of the many scripts in Unicode, as demonstrated by this screenshot from the OpenOffice.org application.
15px
Various Cyrillic characters shown with upright, oblique and italic alternate forms

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

In practice the C1 code points are often improperly-translated (mojibake) as the legacy Windows-1252 characters used by some English and Western European texts.

IBM code page numbers (CPGIDs and CCSIDs) used for CJK encodings. Microsoft use of code page numbers for CJK encodings differs, and is noted in brackets where applicable.

Code page

3 links

IBM code page numbers (CPGIDs and CCSIDs) used for CJK encodings. Microsoft use of code page numbers for CJK encodings differs, and is noted in brackets where applicable.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers.

1004 – Latin-1 Extended, Desk Top Publishing/Windows

ASCII chart from a pre-1972 printer manual

ASCII

3 links

ASCII chart from a pre-1972 printer manual
ASCII (1963). Control pictures of equivalent controls are shown where they exist, or a grey dot otherwise.

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication.

A popular further extension designed by Microsoft, Windows-1252 (often mislabeled as ISO-8859-1), added the typographic punctuation marks needed for traditional text printing.

Windows code page

2 links

Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s.

The term "ANSI" is a misnomer because these Windows code pages do not comply with any ANSI standard; code page 1252 was based on an early ANSI draft that became the international standard ISO 8859-1, which adds a further 32 control codes and space for 96 printable characters.

ISO/IEC 8859-15

1 links

ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999.

All the printable characters from both ISO/IEC 8859-1 and ISO/IEC 8859-15 are also found in Windows-1252.