ISO 2022
|
ISO 2022, more formally ISO/IEC 2022, is an ISO standard (equivalent to the ECMA standard ECMA-35) specifying a technique for including multiple character sets in a single character encoding. Unlike ISO 8859 character encodings which use 8 bits for every character, the ISO 2022 encodings are variable size encodings typically using either 8 or 16 bits per character. Several character encodings use ISO 2022 mechanisms. For example, ISO-2022-JP is a widely used character encoding for the Japanese language.
Contents |
Introduction
Many languages or language families not based on the Latin alphabet such as Greek, Russian, Arabic, or Hebrew have historically been represented on computers with 8-bit extended ASCII encodings including the ISO 8859 family of character sets. Written East Asian languages, specifically Chinese, Japanese, and Korean, use far more characters than fit in an 8-bit computer byte and were first represented on computers with language-specific double byte encodings. ISO 2022 was developed as a technique to represent characters in multiple character sets within a single character encoding. The ISO 2022 character encodings include escape sequences which indicate the character set for characters which follow. The escape sequences are registered with ISO and are often three characters long starting with the ASCII ESCAPE character (hexadecimal 1B, octal 33). These character encodings require data to be processed sequentially in a forward direction since the correct interpretation of the data depends on the most recently encountered escape sequence. Although the ISO 2022 character sets are still in common use, particularly ISO-2022-JP, most modern e-mail applications are converting to use the simpler Unicode character encodings such as UTF-8.
ISO 2022 Character Sets
Character encodings using ISO 2022 mechanism include:
- ISO-2022-JP - widely used encoding for Japanese. Starts in ASCII and includes the following escape sequences
- ESC ( B to switch to ASCII (1 byte per character)
- ESC ( J to switch to JIS X 0201-1976 (1 byte per character)
- ESC $ @ to switch to JIS X 0208-1978 (2 bytes per character)
- ESC $ B to switch to JIS X 0208-1983 (2 bytes per character)
- ISO-2022-JP-1 - Same as ISO-2022-JP with one additional escape sequence
- ESC $ ( D to switch to JIS X 0212-1990 (2 bytes per character)
- ISO-2022-JP-2 - Multilingual extension of ISO-2022-JP. Same as ISO-2022-JP-1 with the following additional escape sequences
- ESC $ A to switch to GB 2312-1980 (2 bytes per character)
- ESC $ ( C to switch to KS X 1001-1992 (2 bytes per character)
- ESC . A to switch to ISO 8859-1 (1 byte per character)
- ESC . F to switch to ISO 8859-7 (1 byte per character)
- ISO-2022-JP-3 - Same as ISO-2022-JP with two additional escape sequences
- ESC $ ( O to switch to JIS X 0213-2000 Plane 1 (2 bytes per character)
- ESC $ ( P to switch to JIS X 0213-2000 Plane 2 (2 bytes per character)
- ISO-2022-KR - Korean
- ESC $ ) C to switch to KS X 1001-1992 (2 bytes per character)
- ISO-2022-CN - Chinese
- ESC $ ) A to switch to GB 2312-1980 (2 bytes per character)
- ESC $ ) G to switch to CNS 11643-1992 Plane 1 (2 bytes per character)
- ESC $ * H to switch to CNS 11643-1992 Plane 2 (2 bytes per character)
- ISO-2022-CN-EXT - Same as ISO-2022-CN with six additional escape sequences
- ESC $ ) E to switch to ISO-IR-165 (2 bytes per character)
- ESC $ + I to switch to CNS 11643-1992 Plane 3 (2 bytes per character)
- ESC $ + J to switch to CNS 11643-1992 Plane 4 (2 bytes per character)
- ESC $ + K to switch to CNS 11643-1992 Plane 5 (2 bytes per character)
- ESC $ + L to switch to CNS 11643-1992 Plane 6 (2 bytes per character)
- ESC $ + M to switch to CNS 11643-1992 Plane 7 (2 bytes per character)
See also
- ISO 646
- CJK
- Mojibake
- Lunde, Ken. 1998. CJKV Information Processing. O'Reilly & Associates. ISBN 1565922247
External links
- International Organization for Standardization (http://www.iso.org/)
- ECMA-35 (http://www.ecma-international.org/publications/standards/Ecma-035.htm)
- International Register of Coded Character Sets to be Used with Escape Sequences (http://www.itscj.ipsj.or.jp/ISO-IR/)
- History of Character Codes in North America, Europe, and East Asia (http://tronweb.super-nova.co.jp/characcodehist.html)
- CJK.INF: a document on encoding Chinese, Japanese, and Korean (CJK) languages, including a discussion of the various variants of ISO 2022 (ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf). Also available by HTTP (http://examples.oreilly.com/cjkvinfo/doc/cjk.inf).