Filename extension
|
A filename extension or filename suffix is an extra set of (usually) alphanumeric characters that is appended to the end of a filename to allow computer users (as well as various pieces of software on the computer system) to quickly determine the type of data stored in the file. It is one of several popular methods for distinguishing between file formats.
File managers such as Windows Explorer can have applications assigned for almost every file name extension. For example, a text editor for .txt, a word processor for .doc, a web browser for .htm or .html, PDF viewer or editor for .pdf, a graphics program for .png, .gif or .jpg, a spreadsheet program for .xls, etc. Some extensions, including .exe, .com, .bat, and .cmd, indicate that the file itself may be executed under Windows.
Filename extensions have been in use for decades, but they have gained common usage because the file systems included with DOS and Windows had severe limitations on filenames for many years. They can be considered as a type of metadata, though one of the most visible pieces of such information on modern computer systems.
Contents |
Historical limitations
Early versions of the FAT filesystem used in DOS and Windows had a limitation that only eleven characters could be used to name files. This 11-character space was divided into two components, normally separated by a period (.). The first part, consisting of up to eight characters, was generally called the filename or the base name, while the up to three remaining characters constituted the extension. This is sometimes referred to as the "8.3" convention, and since the word filename is eight letters long and ext is a reasonable abbreviation for extension, it can be generalized as:
FILENAME.EXT
When doing a file listing, the base name and extension would be separated by spaces, much like this:
Volume in drive A: is LINUX BOOT Volume Serial Number is 2410-07EF Directory for A:\ LDLINUX SYS 5480 1999-04-19 23:24 VMLINUZ 530921 1999-04-19 23:24 BOOT MSG 559 1999-04-19 23:24 EXPERT MSG 668 1999-04-19 23:24 GENERAL MSG 986 1999-04-19 23:24 KICKIT MSG 979 1999-04-19 23:24 PARAM MSG 875 1999-04-19 23:24 RESCUE MSG 1020 1999-04-19 23:24 SYSLINUX CFG 420 1999-04-19 23:24 INITRD IMG 878502 1999-04-19 23:24 10 files 1,420,410 bytes 35,840 bytes free
This use of spaces often led to confusion with novice DOS users, who thought of the "." as part of the file's identifier, rather than merely a convention for separating the two components of that identifier.
The need for more
The filename extension was originally used to easily determine the file's generic type. The need to condense the type of a file into three characters frequently led to inscrutable extensions. Examples include using .GFX
for graphics files, .TXT
for plain text, and .MUS
for music. However, because many different software programs have been made that all handle these data types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early WordStar files used .WS
or .WSn
, where n was the program's version number. Also, filename extensions began to conflict between separate files. One example is .rpm
, used by both the RPM Package Manager and RealPlayer (for RealPlayer Media files); another being .qif
shared by both Quicken Information Files (financial ledgers) and QuickTime Image Format (pictures).
As time went on, hundreds of different extensions came into use, as software developers invented more and more file formats. This led to reference manuals being published, devoted entirely to listing the extensions and the type (or types) of data that might be found in files so named. These issues led to the need for alternative systems with lower chances of conflicts.
Other operating systems, such as Unix and Mac OS, generally had much more liberal standards for filenames. Many allowed full filename lengths of approximately 32 characters, and ranges up to 255 were not uncommon. These systems generally allowed for variable-length filename extensions, and also tended to allow more than one dot—partly because they had additional methods for determining file format information. In fact, Unix does not internally know about filename extensions, and just treats the '.' as a normal character in the filename. As the Internet age arrived, it was possible to discern who was using Windows systems to edit their web pages versus who used Macintosh or Unix computers, since the Windows users were generally restricted to ending their web page filenames in .HTM
(instead of .html
). This also became a problem with programmers experimenting with the Java programming language, since it required source code files to have the four-letter extension .java
and compiled object code output files to have the five-letter .class
extension.
Eventually, Microsoft introduced long filenames and an extended version of the commonly used FAT file system called VFAT to deal with this issue. Microsoft and IBM had previously collaborated on the High Performance File System (HPFS), used in OS/2 and later in Windows NT as NTFS, which did not have strict limitations either. VFAT's long filenames are largely considered to be an ugly kludge, but they removed the important length restriction and allowed files to have a mix of upper case and lower case letters. However, the habit of using three character extensions has continued, along with the problems it creates.
Security issues
Depending on the settings of the shell/file browser the file extension may not be shown. Malicious users who spread a computer virus or computer worm may use a file name like LOVE-LETTER-FOR-YOU.TXT.vbs
which then shows up as LOVE-LETTER-FOR-YOU.TXT
if the user has file extensions disabled (which is the default behavior of Microsoft's software). Therefore, to a user who has file extensions hidden, this may look like a harmless text file rather than a potentially dangerous computer program written in VBScript.
This issue is becoming less and less serious as the number of attack vectors increases: not only the vast majority of users ignores some of the most obscure dangerous extensions, but files with extensions previously considered safe (like .TXT
and .ZIP
) have been successfully used as attack vectors; in the case of .TXT
, with a file that told users that certain system files were malware and urged to delete them, and, in the case of .ZIP
, with an archive from which the user extracted a malicious executable and willingly ran it.
It is often considered the responsibility of the e-mail program to warn the user of dangerous attachments, or to block their execution altogether, to stop at least the former kind of attack; handling the latter is more a matter of education and training, but its impact can be somewhat mitigated with the application of the principle of least privilege (including, but not limited to, sandboxing). Most programs already provide such protection (notably Eudora, which in the latest Windows versions even extends this functionality to the operating system by means of a shell extension).
Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) include a customizable database of file types that could be considered dangerous in certain zones (including, but not limited to, downloads from the WWW and e-mail attachments), that applications can query, and standardize a common API to invoke antivirus programs. These mechanisms are meant to replace the often inconsistent, conflicting or weak mechanisms that existing applications already have in place, hopefully spelling death for nonsense such as certain antivirus software blacklisting scripts as intrinsically dangerous - even more so, in fact, than native executables. The latter approach is actually a cover-up to hide a well-known weakness of blacklist-based (as opposed to heuristic) antivirus software: malware can evade detection by simply "shifting shape" into a semantically equivalent form, becoming different enough from what the antivirus expects to stay undetected. This technique, usually called polymorphism, is a lot easier and more effective with scripting languages. In short, most antivirus software can only block known malware, making them useless against custom (or merely yet unknown) malware.
Relation to Internet content types
In network contexts, files are regarded as streams of bits and do not have filenames or filename extensions.
In the internet protocol suite the information about a certain type relating to a certain bitstream is encoded in the MIME Content-type of the stream, represented by a row of text in a block of text preceding the stream, such as:
Content-type: text/plain
Some operating systems and desktop environments such as BeOS, KDE or GNOME have started using MIME Content-types to tag files with appropriate metadata about the file content type, as a way of getting out of the dependency on filename extensions. Mapping filename extensions to content-types is then done using different heuristics, such as examining both the filename extension and the contents of the file.
See also
- List of file formats (also includes a list of filename extensions)
- File format#By file extension
External links
- Listings of common filename extensions: [1] (http://www.dotwhat.net) [2] (http://filext.com) [3] (http://www.fnds.net/ext/a.html)
- Microsoft Application Search (http://shell.windows.com/fileassoc/0409/xml/redir.asp) (Add &ext= followed by an extension to search for that extension.
- An explanation of file extensions. (http://www.sharpened.net/helpcenter/extensions.php)da:Filtype
de:Dateiendung es:Extensión de archivo fr:Extension de fichier nl:Bestandsextensie ja:拡張子 pl:Rozszerzenie pt:Extensão de nome de ficheiro fi:Tiedostopääte zh:文件扩展名