Null character
|
The null character (also null terminator) is a character with the value zero, present in the ASCII and Unicode character sets, and available in nearly all mainstream programming languages. The original meaning of this character was like NOP — when sent to a printer or a terminal, it does nothing (some terminals, however, incorrectly display it as space).
The character has special significance in C and its derivatives, where it serves as a reserved character used to signify the end of strings. The null character is often represented as '\0'
in source code (in reality an octal escape sequence). Strings ending in a null character are said to be null-terminated.
This differs from certain other languages (such as Pascal) which store a string as an array preceded by a string length. The main advantage of using a null character is that strings can be of any length, and only one character of additional storage is required. Null-terminated strings can also have efficiency benefits, since operations that traverse a string don't need to keep track of how many characters have been seen, and operations which modify the string's length do not need to update the stored length. Cache performance can also be better.
Conversely, the advantage of storing the string's length is that it is always immediately available in constant time; a program using null-terminated strings must count every character in a string to find the string's length, which requires linear or O(n) time. Also, storing the length allows strings to contain null characters, which can simplify data processing by eliminating exceptions. In null-terminated strings, the first occurring null character is interpreted as the end of the string.
However, the datatype used to store the length of a string is also important; if the length is stored as a byte, as in Pascal, strings may only be up to 255 characters long! Larger datatypes, on the other hand, take up more space than a null character (a 16-bit number occupies two bytes, and a 32-bit number takes four). In the 1970s, when C was designed, space considerations were much more important than they are at present, which greatly influenced the choice for null-terminated strings.
- A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string literal.
- - ANSI/ISO 9899:1990 (the ANSI C standard), section 5.2.1
- A string is a contiguous sequence of characters terminated by and including the first null character.
- - ANSI/ISO 9899:1990 (the ANSI C standard), section 7.1.1
- A null-terminated byte string, or NTBS, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character).
- - ISO/IEC 14882 (the ISO C++ standard), section 17.3.2.1.3.1