C trigraph
|
In the C family of programming languages a trigraph is a sequence of three characters that represents a single character, the first two of which are both question marks.
Rationale
The basic character set of C is a subset of the ASCII character set, but nine of its characters lie outside the smaller ISO 646 character set. The ANSI C committee invented trigraphs to permit programs to be written using just the ISO 646 character set.
Trigraph Sequences
The C preprocessor replaces all occurrences of the following nine trigraph sequences by their single-character equivalents before any other processing.
Trigraph Equivalent ======== ========== ??= # ??/ \ ??' ^ ??( [ ??) ] ??! | ??< { ??> } ??- ~
Note that ???
is not a trigraph sequence.
The ??/
trigraph can be used to introduce an escaped newline for line splicing; this makes correct and efficient handling of trigraphs within the preprocessor particularly problematic.
Trigraphs are rarely used outside compiler test suites. Many compilers either have an option to turn recognition of trigraphs off, or disable trigraphs by default and have an option to turn them on. Some can issue warnings when they encounter trigraphs in source files.
Disambiguation
You may want to place two question marks together yet not have the compiler treat them as introducing a trigraph. The C grammar does not permit two subsequent ?
tokens, so the only places in a C file where you might want to use two question marks in a row would be in character and string literals, and comments.
To safely place two consecutive question marks within literals you should use the escape sequence ?\?
.
The ??/
trigraph forms an escaped newline when followed by a new line. This can cause surprises, particularly within comments. For example:
// Will the next line be executed ????????????????/ a++;
which is a single logical comment line, and
/??/ * A comment *??/ /
which is a correctly formed block comment.