Japanese language and computers
|
In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which use double-byte character encodings. Some problems relate to transliteration and romanization, some to character encoding, and some to the input of Japanese text.
Roughly, the issues are mostly either in the presentation or the input of Japanese text.
Contents |
Character encodings
There are several standard methods to encode characters for use on a computer, including JIS, SJIS, EUC, and Unicode. While mapping the set of kana is a simple matter, kanji has proven more difficult. Despite efforts, none of the encoding schemes have become the de facto standard, and multiple encoding standards are still in use today. For example, most Japanese e-mails are in JIS encoding and web pages in Shift-JIS. If a program fails to determine the encoding scheme employed, it can cause mojibake (misconverted characters, literally "character monsters") and thus unreadable text on computers.
Not all required characters may be included in a character set standard such as JIS, so gaiji (外字, external characters) are sometimes used to supplement the character set. Gaiji may come in the form of external font packs, where normal characters have been replaced with new characters, or the new characters have been added to unused character positions. However, gaiji are not practical in Internet environments since the font set must be transferred with text to use the gaiji. As a result, such characters are written with similar or simpler characters in place, or the text may need to be written using a larger character set (such as Unicode) that supports the required character.
Text input
Typing Japanese text on a computer is a complicated matter because it is, in practice, impossible to type all of the characters used in the Japanese writing system with the limited number of keys on keyboards. On modern computers, usually the reading of characters is entered first, then an input method editor shows a list of candidate kanji that are a phonetic match, and allows the user to choose the correct characters. Input method editors are also known as front end processors; more-advanced FEPs work not by word but by phrase, thus increasing the likelihood of getting the desired characters as the first option presented. The input can be either via romanization (rōmaji nyūryoku) or direct kana input (kana nyūryoku). Direct kana input is on the verge of extinction, although it is still widely supported. While there are two main systems for the romanization of Japanese, known as Kunrei-shiki and Hepburn, "keyboard romaji" (also known as wāpuro rōmaji or "word processor romaji") generally allows a loose combination of both; IME implementations may even handle keys for letters unused in any romanization scheme, such as L, converting them to the most appropriate equivalent. With kana input, each key on the keyboard directly corresponds to one kana. The distribution of kana on the keyboard can be either the Oyayubi shift system, which is now obsolete, or the JIS keyboard system.
See also
External links
- A complete introduction to Japanese character encodings (http://www.cs.mcgill.ca/~aelias4/encodings.html)
- CJK.INF (ftp://ftp.ora.com/pub/examples/nutshell/cjkv/doc/cjk.inf), a document providing information on CJK (that is, Chinese, Japanese, and Korean) character set standards and encoding systems
- Japanese text encoding (http://lfw.org/text/jp.html)