Ctype.h
|
- The title of this article is incorrect because of technical limitations. The correct title is ctype.h.
The header ctype.h in the ANSI C Standard Library for the C programming language contains declarations for character classification functions.
Contents |
History
Early toolsmiths writing in C under Unix began developing idioms at a rapid rate to classify characters into different types. For example, the following test identifies a letter:
if ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z')
which gives a correct result if the character set is ASCII. Nevertheless, this idiom does not work for other character sets such as EBCDIC.
Pretty soon, programs became thick with tests such as the one above, or worse, tests almost like the one above. A programmer can write the same idiom several different ways, which slows comprehension and increases the chance for errors.
Before long, the idioms were replaced by the functions in <ctype.h>.
Implementation
Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.
For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written thusly:
#define isdigit(x) (TABLE[x] & 1)
Early versions of Linux used a potentially faulty method similar to the first code sample:
#define isdigit(x) ((x) >= '0' && (x) <= '9')
This can cause problems if x is changed by being evaluated---for instance, if one calls isdigit(x++) or isdigit(run_some_program()). It would not be immediately evident that the argument to isdigit is being evaluated twice. For this reason, the table-based approach is generally used.
The difference between these two methods became a point of interest during the SCO v. IBM case. (See Linus Torvalds's kernel mailing list post below.)
The contents of <ctype.h>
The <ctype.h>
contains prototypes for a dozen character classification functions:
isalnum
- test for alphanumeric characterisalpha
- test for alphabetic characterisblank
- test for blank character (new in C99)iscntrl
- test for control characterisdigit
- test for digitisgraph
- test for graphic characterislower
- test for lowercase characterisprint
- test for printable characterispunct
- test for punctuation characterisspace
- test for space characterisupper
- test for uppercase characterisxdigit
- test for hexadecimal digit
All of these functions are of the form int isfunc(int);
and return a nonzero number for true and zero for false.
and two character conversion functions:
tolower
- convert character to lowercasetoupper
- convert character to uppercase
These functions are of the form int tofunc(int);
and return the parameter converted if the character is alphabetic, or the character unconverted if not.
The Single Unix Specification Version 3 adds functions similar to the above:
isascii
- return if the parameter is between 0 and 127toascii
- converts a character to ASCII
All of these functions except isdigit
are locale-specific; their behavior may change if the locale changes.
External link
- Explanation of the table method (http://groups.google.com/groups?selm=15FWF-45f-17%40gated-at.bofh.it) by Linus Torvalds