Polymorphic code
|
In computer terminology, polymorphic code is code that mutates while keeping the original algorithm intact.
Polymorphic code was invented in 1992 by the Bulgarian cracker Dark Avenger (a pseudonym) as a means of avoiding pattern recognition from antivirus-software.
This technique is sometimes used by computer viruses, shellcodes and computer worms to hide their presence. Most anti virus-software and intrusion detection systems attempt to locate malicious code by searching through computer files and data packets sent over a computer network. If the security software finds patterns that correspond to known computer viruses or worms, it takes appropriate steps to neutralize the threat. Polymorphic algorithms make it difficult for such software to locate the offending code as it constantly mutates.
Encryption is the most commonly used method of achieving polymorphism in code. However, not all of the code can be encrypted as it would be completely unusable. A small portion of it is left unencrypted and used to jumpstart the encrypted software. Anti-virus software targets this small unencrypted portion of code.
Malicious programmers have sought to protect their polymorphic code from this strategy by rewriting the unencrypted decryption engine each time the virus or worm is propagated. Sophisticated pattern analysis is used by anti-virus software to find underlying patterns within the different mutations of the decryption engine in hopes of reliably detecting such malware.
Example
An algorithm that uses, for example, the variables A and B but not the variable C could stay intact even if you added lots of codes that changed the content in the variable C.
The original algorithm:
Start: GOTO Decryption_Code Encrypted: ... lots of encrypted code ... Decryption_Code: *A = Encrypted Loop: B = *A B = B XOR CryptoKey *A = B A = A + 1 GOTO Loop IF NOT A = (Decryption_Code - Encrypted) GOTO Encrypted CryptoKey: some_random_number
The same algorithm, but with lots of unnecessary C-altering codes:
Start: GOTO Decryption_Code Encrypted: ... lots of encrypted code ... Decryption_Code: C = C + 1 *A = Encrypted Loop: B = *A C = 3214 * A B = B XOR CryptoKey *A = B C = 1 C = A + B A = A + 1 GOTO Loop IF NOT A = (Decryption_Code - Encrypted) C = C^2 GOTO Encrypted CryptoKey: some_random_number
The code inside "Encrypted" ("lots of encrypted code") could then search the code between Decryption_Code and CryptoKey and remove all the code that alters the variable C. Before the next time the encryption engine is used, it could input new unnecessary codes that alters C, or even exchange the code in the algorithm into new code that does the same thing.
See also
- Metamorphic code
- Self-modifying code
- alphanumeric code
- shellcode
- software cracking
- security cracking
References
- Diomidis Spinellis. Reliable identification of bounded-length viruses is NP-complete (http://www.spinellis.gr/pubs/jrnl/2002-ieeetit-npvirus/html/npvirus.html). IEEE Transactions on Information Theory, 49(1):280–284, January 2003. doi:10.1109/TIT.2002.806137 (http://dx.doi.org/10.1109/TIT.2002.806137)