Obfuscated code
|
fr:Code impénétrable Obfuscated code is source code that is (perhaps intentionally) very hard to read and understand. Some languages are more prone to obfuscation than others. C, C++ and Perl are most often cited as easily obfuscatable languages. Macro preprocessors are often used to create hard to read code by masking the standard language syntax and grammar from the main body of code. The term shrouded code has also been used.
There are also programs known as obfuscators that may operate on source code, object code, or both, for the purpose of deterring reverse engineering.
Contents |
Recreational obfuscation
Code is sometimes obfuscated deliberately for recreational purposes. There are programming contests which reward the most creatively obfuscated code: the International Obfuscated C Code Contest, Obfuscated Perl Contest, International Obfuscated Ruby Code Contest and Obfuscated PostScript Contest.
There are many varieties of interesting obfuscations ranging from simple keyword substitution, use/non-use of whitespace to create artistic effects, to clever self-generating or heavily compressed programs.
Short obfuscated Perl programs printing "Just another Perl hacker" or something like that are often found in signatures of Perl programmers. See: Just another Perl hacker.
Examples
Take this infamous example from Internet lore:
#include <stdio.h> main(t,_,a)char *a;{return!0<t?t<3?main(-79,-13,a+main(-87,1-_, main(-86,0,a+1)+a)):1,t<_?main(t+1,_,a):3,main(-94,-27+t,a)&&t==2?_<13? main(2,_+1,"%s %d %d\n"):9:16:t<0?t<-72?main(_,t, "@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l,+,/n{n+,/+#n+,/#\ ;#q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l \ q#'+d'K#!/+k#;q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;# \ ){nl]!/n{n#'; r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#n'wk nw' \ iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c \ ;;{nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;#'rdq#w! nr'/ ') }+}{rl#'{n' ')# \ }'+}##(!!/") :t<-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1) :0<t?main(2,2,"%s"):*a=='/'||main(0,main(-61,*a, "!ek;dc i@bK'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1);}
Although unintelligible at first glance, it is a legal C program which when compiled and run will generate the 12 verses of The 12 Days of Christmas. It actually contains all the strings required for the poem in an encoded form inlined in the code. The code then iterates through the 12 days displaying what it needs to.
Another example is a program's source listing that was formatted to resemble an empty tic-tac-toe board. Each pass through the program modified the sourcecode to show a turn in the game, to be executed for the next move.
Yet another example is this short program that generates mazes of arbitrary length:
char*M,A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf(M="%d",&C); -- E; J[ E] =T [E ]= E) printf("._"); for(;(A-=Z=!Z) || (printf("\n|" ) , A = 39 ,C -- ) ; Z || printf (M ))M[Z]=Z[A-(E =A[J-Z])&&!C & A == T[ A] |6<<27<rand()||!C&!Z?J[T[E]=T[A]]=E,J[T[A]=A-Z]=A,"_.":" |"];}
Note the shape of the corridors in the program. To compile this program, most modern C compilers need an option like GCC's -fwritable-strings to allow overwriting of string literals.
Professional obfuscation
Intermediate compiled code such as Java and .NET is often obfuscated professionally. Professional obfuscation of intermediate compiled programs helps protect against reverse engineering while making applications smaller and more efficient. Professional obfuscation may be regarded by some as an example of security through obscurity.
Obfuscation and information-hiding
One definition of "code obfuscation" is a set of transformations on a program, that preserve the same black box specification while making the internals difficult to reverse-engineer. There turns out to be many such transformations.
For example, dynamic languages such as Java, C#, and Lisp store a program's symbol table within the compiled output. One common obfuscation is to rename every class from something descriptive like "Encryption_Index", to a meaningless sequence such as "rb". The class methods can be renamed to a(), b(), etc.
When writing source code, programmers generally create a great deal of structure, according to rules from Structured Programming, OOP, and other methodologies. Compilers tend to propagate this structure into compiled code. The job of a good obfuscator is to destroy as much as possible of this structure that lends a program to being human-readable.
Uses for obfuscation
Makes reverse engineering more difficult
Even when a language is compiled to an executable or bytecode file, someone may choose to run a decompiler which converts these files back into human-readable form (generally without comments). This could help them understand whatever lies hidden within the source code, against the wishes of the code's creator. Obfuscation serves to increase the difficulty of decompilation, usually forcing someone who wants that information to use more costly forms of reverse engineering.
However, sometimes language obfuscation can be easily defeated (reverse engineered). For example, some websites obscure their JavaScripts so as to prevent code copying and/or modification. This can be defeated quickly be viewing the DOM of the page.
Minimizes code size
Obfuscation usually breaks down structures which make programs modular and maintainable. This has the pleasant side-effect of reducing code size in many cases. For example, in dynamic languages that incorporate a symbol table with the executable code, simple variable renaming can save a great deal of space in the resulting code footprint. This is a crucial consideration if code size must be kept to a minimum, as with code that must be sent over a network or embedded into a small device.
Concealment of evidence
Spammers frequently use obfuscated JavaScript or HTML code in spam messages. The obfuscated message, when displayed by an HTML-capable e-mail client, appears as a reasonably normal message -- albeit with obnoxious JavaScript behaviors such as spawning pop-up windows. However, when the source is viewed, the obfuscations make it far more difficult for investigators to discern where the links go, or what the JavaScript code does. [1] (http://www.pcplus.co.uk/tips/default.asp?pagetypeid=2&articleid=5583&subsectionid=390)
Dealers in spamming software have sold JavaScript obfuscators for the purpose of confounding investigators. Some of the techniques use JavaScript's dynamic nature -- a piece of code is stored as an encrypted string, which is decrypted and evaluated. This may be done several times. Other techniques include insertion of dummy code, as well as dummy HTML links to legitimate pages.
Disadvantages of obfuscation
Debugging
Obfuscated code is extremely difficult to debug. Variable names will no longer make sense, and the structure of the code itself will likely be modified into unrecognizability. This fact generally forces developers to maintain two builds: One that can be easily debugged, and another for release. Both builds should be tested to make sure they act identically.
Portability
Obfuscated code often depends on the particular characteristics of the platform and compiler making it difficult to manage if either change.
Defective obfuscators
Occasionally an obfuscator may be buggy, in a difficult to reproduce way. There is little one can do except find or create a newer version or fiddle with any inputs to the obfuscator until it magically works.
Conflicts with Reflection APIs
Reflection is a set of APIs in various languages that allow an object to be examined or created just by knowing its classname at run-time. Many obfuscators allow specified classes to be exempt from renaming; and it is also possible to let a class be renamed and call it by its new name. However, the former option places limits on the dynamism of code, while the latter adds a great deal of complexity and inconvenience to the system.
See also
External links
- IBM's Java Obfuscator (http://www.research.ibm.com/jax/)
- Professional Java and .NET Obfuscators (http://www.preemptive.com/)
- Microsoft Script Encoder (http://www.microsoft.com/downloads/details.aspx?FamilyId=E7877F67-C447-4873-B1B0-21F0626A6329)
- Analysis of the 12 days program (http://research.microsoft.com/~tball/papers/XmasGift/)
- Analysis of the obfuscated maze generating program (http://www.cwi.nl/~tromp/maze.html)
- Obfuscated Perl program with explanation (http://perl.plover.com/obfuscated/)
- International Obfuscated C Code Contest (http://www.ioccc.org/)
- The free perl obfuscation service (http://liraz.org/obfus.html)
- POBS Free PHP Obfuscator (http://pobs.mywalhalla.net/)
- International Obfuscated Ruby Code Contest (http://iorcc.dyndns.org/)
- Protecting Java Code Via Code Obfuscation (http://www.cs.arizona.edu/~collberg/Research/Students/DouglasLow/obfuscation.html)