KOI8-R
|
KOI8-R is an 8-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet. It also happens to cover Bulgarian. A derivative encoding is KOI8-U, which adds Ukrainian characters.
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In recent times, both might eventually give way to Unicode.
In Russian, KOI8 stands for Kod Obmena Informatsiey, 8 bit (Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-R becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped.
KOI8-R | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | xA | xB | xC | xD | xE | xF | |
0x | unused | |||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | Template:Num | Template:Num | Template:Num | Template:Num | Template:Num | Template:Num | Template:Num | Template:Num | Template:Num | Template:Num | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ─ | │ | ┌ | ┐ | └ | ┘ | ├ | ┤ | ┬ | ┴ | ┼ | ▀ | ▄ | █ | ▌ | ▐ |
9x | ░ | ▒ | ▓ | ⌠ | ■ | ∙ | √ | ≈ | ≤ | ≥ | NBSP | ⌡ | ° | ² | · | ÷ |
Ax | ═ | ║ | ╒ | ё | ╓ | ╔ | ╕ | ╖ | ╗ | ╘ | ╙ | ╚ | ╛ | ╜ | ╝ | ╞ |
Bx | ╟ | ╠ | ╡ | Ё | ╢ | ╣ | ╤ | ╥ | ╦ | ╧ | ╨ | ╩ | ╪ | ╫ | ╬ | © |
Cx | ю | а | б | ц | д | е | ф | г | х | и | й | к | л | м | н | о |
Dx | п | я | р | с | т | у | ж | в | ь | ы | з | ш | э | щ | ч | ъ |
Ex | Ю | А | Б | Ц | Д | Е | Ф | Г | Х | И | Й | К | Л | М | Н | О |
Fx | П | Я | Р | С | Т | У | Ж | В | Ь | Ы | З | Ш | Э | Щ | Ч | Ъ |
In the table above, 20 is the regular SPACE character, and 9A is the NO-BREAK SPACE.
Although RFC 1489 says that character 95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.