Cyclone programming language
|
The Cyclone programming language is intended to be a safe dialect of the C programming language. It is designed to avoid the buffer overflow and other vulnerabilities that are inherent in C, without losing the power and convenience of C as a tool for systems programming.
Cyclone was jointly developed by Greg Morrisett's group at Cornell University and AT&T Labs Research in the early 2000s. It received a certain amount of publicity in November 2001. As of June 15, 2004, the Cyclone compiler stands at version 0.8.1.
Contents |
Language Features
Cyclone is meant from the ground up to avoid some of the common pitfalls of the C programming language, whilst still maintaining the look and performance of C. To this end, Cyclone places the following restrictions upon programs:
-
NULL
checks are inserted to prevent segmentation faults - Pointer arithmetic is restricted
- Pointers must be initialized before use
- Dangling pointers are prevented through region analysis and limitations on
free()
- Only "safe" casts and unions are allowed
-
goto
into scopes is disallowed -
switch
labels in different scopes are disallowed - Pointer-returning functions must execute
return
-
setjmp
andlongjmp
are not supported
In order to maintain the tool set that C programmers are used to, Cyclone provides the following extensions:
- Never-
NULL
pointers do not requireNULL
checks - "Fat" pointers support pointer arithmetic with run-time bounds checking
- Growable regions support a form of safe manual memory management
- Garbage collection for heap-allocated values
- Tagged unions support type-varying arguments
- Injections help automate the use of tagged unions for programmers
- Polymorphism replaces some uses of
void *
- varargs are implemented as fat pointers
- Exceptions replace some uses of
setjmp
andlongjmp
For a better high-level introduction to Cyclone, the reasoning behind Cyclone and the source of these lists, please see [1] (http://www.research.att.com/projects/cyclone/papers/cyclone-safety.pdf).
Although Cyclone looks, in general, much like C, it should be thought of as a C-like language (http://en.wikipedia.org/wiki/Category:C_dialects). With that, let us look at more features of the language, in depth.
Pointer/reference types
Cyclone implements three kinds of reference (following C terminology these are called pointers):
-
*
(the normal type) -
@
(the never-NULL
pointer), and -
?
(the only type with pointer arithmetic allowed, "fat" pointers).
The purpose of introducing these new pointer types is to avoid common problems when using pointers. Take for instance a function, called foo
that takes a pointer to an int:
int foo(int *);
Although the person who wrote the function foo
could have inserted NULL
checks, let us assume that for performance reasons they did not. Calling foo(NULL);
will result in undefined behavior (typically, although not necessarily, a SIGSEGV being sent to the application). To avoid such problems, Cyclone introduces the @
pointer type, which can never be NULL
. Thus, the "safe" version of foo
would be:
int foo(int @);
This would tell the Cyclone compiler that the argument to foo
should never be NULL
, avoiding the aforementioned undefined behavior. The simple change of *
to @
saves the programmer from having to write NULL
checks and the operating system from having to trap NULL
pointer dereferences. This extra restriction, however, can be a rather large stumbling block for most C programmers, who are used to being able to manipulate their pointers directly with arithmetic. Although this is desirable, it can lead to buffer overflows and other off-by-one-style attacks. To avoid this, the ?
pointer type is delimited by a known bound, the size of the array. Although this adds overhead due to the extra information stored about the pointer, it improves safety and security. Take for instance a simple (and naïve) strlen
function, written in C:
int strlen(const char *s) { int iter = 0; if (s == NULL) return 0; while (s[iter] != '\0') { iter++; } return iter; }
This function assumes that the string being passed in is terminated by NUL ('\0'
). However, what would happen if char buf[] = {'h','e','l','l','o','!'};
were passed to this string? This is perfectly legal in C, yet would cause strlen
to iterate through memory not necessarily associated with the string s
. There are functions, such as strnlen
which can be used to avoid such problems, but these functions are not standard with every implementation of ANSI C. The Cyclone version of strlen
is not so different from the C version:
int strlen(const char ? s) { int iter = 0, n = s.size; if (s == NULL) return 0; for (; iter < n; iter++, s++) { if (*s == '\0') return i; } return n; }
Here, strlen
bounds itself by the length of the array passed to it, thus not going over the actual length. Each of the kinds of pointer type can be safely cast to each of the others, and arrays and strings are automagically cast to ?
by the compiler. (Casting from ?
to *
invokes a bounds check, and casting from ?
to @
invokes both a NULL
check and a bounds check. Casting from *
or @
results in no checks whatsoever; the resulting ?
pointer has a size of 1.)
Dangling Pointers and Region Analysis
Consider the following code, in C:
char *itoa(int i) { char buf[20]; sprintf(buf,"%d",i); return buf; }
This returns an object that is allocated on the stack of the function itoa
, which is not available after the function exits. While gcc and other compilers will warn about such code, this will typically compile without warnings:
char *itoa(int i) { char buf[20],*z; sprintf(buf,"%d",i); z = buf; return z; }
Cyclone does regional analysis of each segment of code, preventing dangling pointers, such as the one returned from this version of itoa
. All of the local variables in a given scope are considered to be part of the same region, separate from the heap or any other local region. Thus, when analyzing itoa
, the compiler would see that z
is a pointer into the local stack, and would report an error.
Manual Memory Management
Examples
The best example to start with is the classic Hello world program:
#include <stdio.h> #include <core.h> using Core; int main(int argc, string_t ? args) { if (argc <= 1) { printf("Usage: hello-cyclone <name>\n"); return 1; } else { printf("Hello from Cyclone, %s\n", args[1]); } return 0; }
Thanks
Most of this page is a re-edit of the "Cyclone: a safe dialect of C" (http://www.research.att.com/projects/cyclone/papers/cyclone-safety.pdf) document. Many thanks to Trevor Jim, Greg Morrisett, Dan Grossman, Michael Hicks, James Cheney and Yanling Wang for creating great software and great documents to spread the word with.
External links
- A Safe Dialect of C (http://www.eecs.harvard.edu/~greg/cyclone/) or you can use the alternative from AT&T's website [2] (http://www.research.att.com/projects/cyclone/)de:Cyclone