Code (cryptography)
|
In the context of cryptography, a code is a method used to transform a message into an obscured form, preventing those not in on the secret from understanding what is actually transmitted. The usual method is to use a codebook with a list of common phrases or words matched with a codeword. Messages in code are sometimes termed codetext.
Terms like code and in code are often used to refer to any form of encryption. However, there is a major distinction between codes and ciphers in technical work; it is the scope of the transformation involved. Codes work at the level of meaning; that is, words or phrases are converted into something else. Ciphers work at the level of individual letters, or small groups of letters, or even, in modern ciphers, with individual bits. While a code might transform "attack" into "FRGPL" or "mincemeat pie", a cipher transforms elements below the semantic level, ie, below the level of meaning. The "a" in attack might be converted to "Q", the first "t" to "f", the second "t" to "3", and so on. Ciphers are more convenient than codes in some situations, there being no need for a codebook.
Codes on the other hand, were long believed to be more secure than ciphers, there being (if the compiler of the codebook did a good job) no pattern of transformation which can be discovered. With the advent of automatic processors (ie, in recent times the electronic computer), ciphers have come to dominate cryptography.
Contents |
One- and two-part codes
Codes are usually defined by "codebooks", which are dictionaries of codegroups listed with their corresponding plaintext. Codes originally seem to have had the codegroups assigned in 'plaintext order'. For example, in a code using numeric code groups, a plaintext word starting with "a" would have a low-value group, while one starting with "z" would have a high-value group. The same codebook could be used to "encode" a plaintext message into a coded message or "codetext", and "decode" a codetext back into plaintext message.
However, such "one-part" codes had a certain predictability that made it easier for others to notice patterns and "crack" or "break" the message, revealing the plaintext or part of it. In order to make life more difficult for codebreakers, codemakers then designed codes with no predictable relationship between the codegroups and the ordering of the matching plaintext. This meant that two codebooks were now required, one to look up plaintext to find codegroups for encoding, the other to look up codegroups to find plaintext for decoding. Students of foreign languages work much the same way; for, say, a Frenchman studying English, there is need of both an English-French and a French-English dictionary. Such "two-part" codes required more effort to develop, and twice as much effort to distribute and discard safely, but they were harder to break.
Cryptanalysis of codes
While solving, say, a monoalphabetic substitution cipher is easy, solving even a simple code is difficult. Decrypting a coded message is a little like trying to translate a document written in an alien language, with the task basically amounting to building up a "dictionary" of the codegroups and the plaintext words they represent.
One fingerhold on a simple code is the fact that some words are more common than others, such as "the" or "a" in English. In telegraphic messages, the codegroup for "STOP" (end of sentence) is usually very common. This helps define the structure of the message in terms of sentences, if not their meaning.
Further progress can be made against a code by collecting many messages encrypted with the same code and then using information from other sources
- spies,
- newspapers,
- diplomatic cocktail party chat,
- the location from where a message was sent,
- where it was being sent to (ie, traffic analysis)
- the time the message was sent,
- events occurring before and after the message was sent
- the normal habits of the people sending the coded messages
- etc.
For example, a particular codegroup found almost exclusively in messages from a particular army and nowhere else might very well indicate the commander of that army. A codegroup that appears in messages preceding an attack on a particular location may very well stand for that location.
Of course, cribs are an immediate giveaway to the definitions of codegroups. As codegroups are determined, they can gradually build up a critical mass, with more and more codegroups revealed from context and educated guesswork. One-part codes are more vulnerable to such educated guesswork than two-part codes, since if the codenumber "26839" of a one-part code is determined to stand for "bulldozer", then the lower codenumber "17598" will likely stand for a plaintext word that starts with "a" or "b".
Various tricks can be used to "plant" or "sow" information into a coded message, for example by executing a raid at a particular time and location against an enemy, and then examining code messages sent after the raid. Coding errors are a particularly useful fingerhold into a code; people reliably make errors, sometimes disastrous ones. Of course, planting data and exploiting errors works against ciphers as well.
- The most obvious and, in principle at least, simplest way of cracking a code is to steal the codebook through bribery, burglary, or raiding parties — procedures sometimes glorified by the phrase "practical cryptology" — and this is the weakness of both codes and ciphers, though codebooks are generally larger and used longer than cipher keys. While a good code may be harder to break than a cipher, the need to write and distribute codebooks is seriously troublesome.
Constructing a new code is like building a new language and writing a dictionary for it, which was an especially big job before computers. If a code is compromised, the entire task must be done all over again, and that means a lot of work for both cryptographers and the code users. In practice, when codes were in widespread use, they were usually changed on a periodic basis to frustrate codebreakers.
Once codes have been created, codebook distribution is logistically clumsy, and increases chances the code will be compromised. There is a saying that two people can keep a secret if one of them is dead, and though that may be something of an exaggeration, a secret becomes harder to keep if it is shared among several people. Codes can be reasonably secure if they are only used by a few people, but if whole armies use the same codebook, keeping them secure becomes much more difficult.
In contrast, the security of ciphers is, as mentioned earlier, generally dependent on protecting the cipher keys. Cipher keys can be stolen and people can betray them, but they are much easier to change and distribute.
Superencipherment
In more recent practice it became typical to encipher a message after first encoding it, so as to provide greater security. With a numerical code, this was commonly done with an "additive" - simply a long key number which was digit-by-digit added to the code groups, modulo 10. Unlike the codebooks, additives would be changed frequently.
One might wonder why a code would be used if it had to be enciphered to provide security? As well as providing security, a well designed code can also compress the message, and provide some degree of automatic error correction.
References
- David Kahn (1996) The Codebreakers : The Comprehensive History of Secret Communication from Ancient Times to the Internet, Scribner.
- Cliff Pickover (2000) Cryptorunes: Codes and Secret Writing, Pomegranate Press.
See also
- code, its non-cryptographic meaning
- Category:Encodings
- Trench code
- JN-25
- Zimmermann telegram
- Code talkers
- This article, or an earlier version of it, incorporates material from Greg Goebel's Codes, Ciphers, & Codebreaking (http://www.vectorsite.net/ttcode.html).