Newline
|
In computing, a newline is a special character or sequence of characters indicating the end of a line. The name comes from the fact that the next character will appear on a new line — that is, on the next line below the text immediately preceding the newline.
Depending on the style used, "single-spaced" text may contain one newline between adjacent paragraphs, in which case the second paragraph is usually indented, or two newlines between paragraphs with no indenting (the default in web browsers). Other variations exist.
Contents |
Methods
Software applications and operating systems usually represent the newline with one or two control characters. Most systems use either LF
(Line Feed) or CR
(Carriage Return) individually, or CR
followed by LF
(represented by CR+LF
).
When a program stores text in a file using ASCII or an ASCII-compatible 8-bit encoding, as is typically the default, these characters are represented with their single ASCII bytes 0A
, 0D
, or 0D
followed by 0A
, respectively.
Some mainframe operating systems still use EBCDIC, an IBM archaic 8-bit encoding that is completely incompatible with ASCII. In EBCDIC, the "Next Line" (NEL
) code at X'15'
is the typical end-of-line character.
On rare occasions, newlines are also called line anchors or line breakers, reflecting differing views of the purpose of newlines. A newline may be considered a line separator, a line terminator or an end of line. This is a fine distinction which is likely only to affect those writing programs which are highly sensitive to the exact structure of a text file. It is similar to the question of whether semicolons separate or terminate statements in the syntax of programming languages. Although most of the time the difference in interpretation is inconsequential, a newline at the end of a file can be troublesome, especially counting the number of lines. Most programs do not consider such a newline to introduce a new line after that character, but some do.
History
ASCII was developed simultaneously by the ISO and the ASA, the predecessor organization to ANSI. During the period 1963-1968, the ISO draft standards supported the use of either CR+LF
or LF
alone as a newline, while the ASA drafts supported only CR+LF
. The Multics operating system began development in 1964 and used LF
alone as its newline. Unix followed the Multics practice, and later systems followed Unix.
The sequence CR+LF
was in common use on many early computer systems that had adapted teletype machines, typically a 33ASR, as a console device, because this sequence was required to position those printers at the start of a new line. On these systems text was often routinely composed to be compatible with these printers. The separation of the two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in one-character time. That is why the sequence was always sent with the CR
first. In fact, it was often necessary to send CR+LF+NUL
(ending with the control character indicating "do nothing") to be sure that the print head had stopped bouncing. Once these mechanical systems were replaced, the two-character sequence had no functional significance, but it has persisted in some systems anyway.
It has been speculated that QDOS (which Microsoft purchased and renamed MS-DOS) adopted CR+LF to indicate a newline to copy the implementation used by CP/M. Further speculation indicates that CP/M chose CR+LF to introduce a deliberate incompatibility with Unix to mitigate a possible lawsuit by AT&T/Bell over violating their Unix copyrights as CP/M was loosely modeled on Unix. This convention was inherited by Microsoft's later Windows operating system.
Variations in conventions
The following list demonstrates the variations in the end-of-line conventions among operating systems:
NEL
CR+LF
- CP/M
- MP/M
- DOS
- Microsoft Windows
- Network Virtual Terminal (the standard presentation for text-based Internet protocols)
CR
- Apple II family
- Mac OS through version 9
LF
Unclassified
- Cygwin - Depends on how it was installed; either LF or CR+LF
- Virtual Memory System (VMS) - Has many text file formats. The default is "Variable Length Record". The format is specified by the "Record format" field of the file's directory entry. Variable Length Record,
CR+LF
,CR
,LF
, Fixed Length Record, etc.
Recognising and fixing incompatibilities
- When a Unix or Macintosh text file is displayed on a Windows machine, it appears as one long line with strange characters where the line breaks should be, often appearing as black rectangles
- When a Windows text file is displayed on a Unix or Mac OS X machine, every line appears to end with a superfluous CR, which often appears as ^M
- When a Windows text file is displayed on a pre-Mac OS X Macintosh, every line appears to begin with a superfluous LF.
Some software will automatically compensate for incompatible line endings, but other software will not. When dealing with text files that are not in the correct format for your operating system, it helps to have a program that can correct them. For example, Notepad, the standard plain text editor pre-installed on all Windows machines does not automatically convert Unix or Macintosh text files, but the DOS edit command (also pre-installed on all Windows machines) does convert Unix or Macintosh style CR line endings and allows to re-save the file with Windows-style CRLF line endings.
C newline
In the C standard I/O library, files can be accessed in either text or binary mode. When performing input or output in text mode on a system where lines are not terminated by the C newline character (\n
), the native line termination is automatically translated into a C newline. (This is a legacy of C's historic grounding in Unix, where there is no need for such a distinction.)
Unicode Standard
Due to the wide variation on how to handle the newline, the Unicode standard prescribes guidelines for handling newlines. The following table shows which characters should be considered a newline.
Acronym | Name | Unicode | ISO-8859-1 | EBCDIC | Open MVS EBCDIC | |
---|---|---|---|---|---|---|
CR | carriage return | 000D | 0D | 0D | 0D | |
LF | line feed | 000A | 0A | 15 | 25 | |
CR+LF | carriage return and line feed | 000D,000A | 0D,0A | 0D,25 | 0D,15 | |
NEL | next line | 0085 | 85 | 25 | 15 | |
VT | vertical tab | 000B | 0B | 0B | 0B | |
FF | form feed | 000C | 0C | 0C | 0C | |
LS | line separator | 2028 | ||||
PS | paragraph separator | 2029 |
There are two columns for the EBCDIC codepages because IBM's MVS Open Edition swaps the meaning of NEL and LF. All other EBCDIC based systems don't do this swapping. Generally, EBCDIC mainframes only use NEL
for a newline, and all other newlines are almost never used natively.
Internet Protocols
Because internet protocols involve direct connections between different systems, many network protocols must deal directly with the variation in newline sequences. A number of conventions have evolved, and most protocols specify what newline sequence(s) to use.
For example the HTTP protocol specifies that the request and response headers used for communication between the client and server should end in CR+LF
for maximum compatibility, although the body of the document transferred can use whatever is its native newline convention. However, the appendix on [Tolerant applications (http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3)] in the HTTP specifications suggests relying on the LF
character alone as a line terminator, ignoring any leading CR
, so that in effect both CR+LF
and LF
alone are acceptable newline sequences. In practice, many applications do just specify the C newline character, which can produce either sequence depending on the platform configuration.de:Zeilenumbruch