DVI file format
|
DVI ("DeVice Independent") is the output file format of the TeX typesetting program, designed by David R. Fuchs in 1979. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer (hence the DVI format's name). DVI files are typically used as input to a second program (called a DVI driver) which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert DVI files to popular document formats (e.g. PostScript, PDF) and for printing. Wikipedia uses a PNG driver to generate graphics for mathematical formulae in articles.
DVI is not a document encryption format, and TeX markup may be at least partially reverse-engineered from DVI files, although this process is unlikely to produce high-level constructs identical to those present in the original markup, especially if the original markup used high-level TeX extensions (e.g. LaTeX).
DVI differs from PostScript and PDF in that it does not support any form of font embedding. (Both PostScript and PDF formats can either embed their fonts inside the documents, or reference external ones.) For a DVI file to be printed or even properly previewed, the fonts it references must be supplied. Also, unlike PostScript, DVI is not a full, Turing-complete programming language, though it does use a limited sort of machine language.
Specification
The DVI format was designed to be compact and easily machine-readable. Toward this end, a DVI file is a sequence of commands which form "a machine-like language", in Knuth's words. Each command begins with an eight-bit opcode, followed by zero or more bytes of parameters. For example, an opcode from the group 0x00 through 0x7F (decimal 127), set_char_i, typesets a single character and moves the implicit cursor right by that character's width. In contrast, opcode 0xF7 (decimal 247), pre (the preamble, which must be the first opcode in the DVI file), takes at least fourteen bytes of parameters, plus an optional comment of up to 255 bytes.
In a broader sense, a DVI file consists of a preamble, one or more pages, and a postamble. Six state variables are maintained as a tuple of signed, 32-bit integers: <math>(h,v,w,x,y,z)<math>. h and v are the current horizontal and vertical offsets from the upper-left corner (increasing v moves down the page), w and x hold horizontal space values, y and z, vertical. These variables can be pushed or popped from the stack.
Fonts are loaded from TFM files. The fonts themselves are not embedded in the DVI file, only referenced. Each font, once loaded, is referred to by an internal index for increased compactness of the format.
The DVI format also relies on the character encodings of the fonts it references, not on those of the system processing it. This means, for instance, that an EBCDIC-based system can process a DVI file that was generated by an ASCII-based system.
References
- DVIType.web (http://www.ctan.org/tex-archive/systems/knuth/texware/dvitype.web), a DVI parser written in WEB, which contains the full DVI format specification when extracted with WEAVE.
External links
- Description of the DVI file format (http://www.math.umd.edu/~asnowden/comp-cont/dvi.html)de:Device independent file format