ISCII
|
ISCII (Indian Script Code for Information Interchange) is a coding scheme for representing various Indic scripts as well as a Latin-based script with diacritic marks used to depict Romanised Indic languages. Most of those scripts are rather similar in structure, but have different letter shapes. So ISCII tries to encode the logical structure of the Indic scripts, while script-specific letter shape are expected to be selected by markup or font specification in rich text. For plain text documents the non-printing ATR character can be used to select script-specific letter shape (this mechanism is similar to the use of escape sequences).
By manually switching between scripts, an automatic transliteration is achieved.
ISCII is a fixed-length 8-bit encoding. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are ISCII-specific.
Unicode has largely preserved the ISCII encoding strategy, but has assigned each language a separate codepoint range. So now there is a series of 128-codepoints-long blocks for Indic scripts.
External links
- The ISCII standard (PDF) (http://varamozhi.sourceforge.net/iscii91.pdf)
- Scripts in various programming languages to convert between ISCII and Unicode (http://cvs.sourceforge.net/viewcvs.py/indlinux/scripts/)de:ISCII