Binary and text files
|
Computer files can be divided into two broad categories: binary and text. Text files are files which contain ordinary textual characters with essentially no formatting; binary files are all other files. The distinction is not universal because any file is fundamentally a sequence of bits, and many computer components (for example, all hard disk circuitry and most system software) make no distinction between file types. However, a large percentage of application programs can understand and use text files in some way, but few programs can typically understand and use the contents of a particular binary file. Hence the distinction can be useful to computer users.
Text files (or plain text files) are files where most bytes represent ordinary readable characters such as letters and digits. Therefore any simple program to view a file makes them human-readable. Generally, they contain ASCII characters and some control characters such as tabs, line feeds and carriage returns without any embedded information such as font information, hyperlinks or inline images. But sometimes text files contain more than ASCII characters if they are encoded by East-Asian encoding such as SJIS or Unicode. If the files are written in Unicode, a UTF standard such as UTF-8 defines the encoding format. Although many text files are generally meant for humans to read, some are (also) used for data storage by computer programs. Text files are sometimes advantageous even for data storage because they avoid certain problems with binary files, such as endianness or variations in the number of bytes devoted to a number.
Text files usually have the MIME type "text/plain", often with suffixes indicating an encoding. Common encodings for plain text include Unicode UTF-8, Unicode UTF-16, ISO 8859, and ASCII.
Plain text is often used as a readable representation of other data that is not itself purely textual: for example, a formatted webpage is not plain text, but its HTML source is. Similarly, source code for computer programs is usually stored in text files, but is compiled into a binary form (described below) for execution.
Transferring text files between Unix, Macintosh, and Microsoft Windows or DOS computers can be problematic, as each platform uses different characters to signify a line break. See new line for a discussion of this confusion. Further cross-platform confusion occurs because many non-Unix systems have traditionally used an Extended ASCII character encoding, where the first 128 byte values conform to ASCII and where the upper 128 byte values are mapped to textual or punctuation characters, such as curly quotes or characters having a diacritical mark. Prior to the advent of Mac OS X, Macintosh users would call a document a text file so long as all of its non-whitespace bytes were printable in the Macintosh environment.
The similar term plaintext is most commonly used in a cryptographic context, and causes some confusion amongst them, especially among those new to computers, cryptography, or data communications.
Binary files, in contrast, may contain any data whatsoever, and usually contain mostly non-alphabetic characters. Computer programs are typical examples, as the data and CPU instructions they contain can — in principle — be any binary value. As a result, compiled applications are often simply referred to as binaries. But binary files can also be image files, sound files, compressed versions of other files (of either type), etc. — in short, any file content whatsoever; some binary file formats even contain some plain text.
Binary files are often rewritten in a plain text representation (using, for example, Base64) to protect them during transmission through certain systems (such as email) that cannot be trusted to handle binary data properly. This encoding has the disadvantage of increasing the file's size by approximately 30% during the transfer.
It is a common misconception that geeks and nerds can read binary files. However, binary is nothing more than a number system. Binary files are usually encoded in bytes, which means the binary digits are grouped in eights. If you open this file in a text editor, each group of eight bits will be translated as a single character, and you will see a (probably unintelligible) text file (see above). If, however, you were to open it in some other application, that application will have its own use for each byte: maybe the application will treat each byte as a number, and it will output a stream of numbers between 0 and 255 — or display a picture interpreting the numbers as colors. If the file is itself treated as an executable and run, then the computer will attempt to interpret the file as a series of instructions in its machine language.
See binary number system to understand how you can convert eight bits into a "normal" number.de:Binär- und Textdatei fr:Fichier texte ja:プレーンテキスト