Threaded code
|
For multi-threaded programming, see Thread (computer science)
In computer science, the term threaded code refers to an implementation technique for programming languages that produces very compact code at the expense of some execution speed. In systems with virtual memory (where memory is simulated with a mechanical disk drive), threaded code may be hundreds of times faster than a less-compact design that does not fit in the available physical memory, because disk drives tend to be roughly a thousand times slower than random-access memory (RAM).
Threaded code is used in the Forth and early versions of the B programming languages, as well as many implementations of FORTRAN, BASIC, COBOL and other languages for small minicomputers.
Contents |
How the model evolves
Threading is a form of code consisting entirely of subroutine calls, written without the subroutine call instruction, and processed by an interpreter (Forth) or the CPU (B), which jumps to each successive piece of basic function code in turn.
In early computers, memory was very expensive. So programmers spent a lot of time trying to find ways to squeeze their programs so they would fit in the memory available. Also, the instruction sets were so primitive that even simple operations like printing a character or dividing one number by another number required quite a few instructions.
Instead of writing out every step of such operations in every part of the program where it was needed (possibly using a macro), programmers saved memory by writing out every step of such operations exactly once and placing it in a subroutine. In some computers there are different types of subroutine-call instructions using different lengths of addresses. In these, the programmers could often arrange to use a shorter form by using threaded code.
The top-level application usually consists of nothing but subroutine calls. And many of those subroutines, in turn, also consist of nothing but subroutine calls. This is called STC (Subroutine Threaded Code).
To save even more space, programmers squeezed those lists of subroutine calls into simple lists of subroutine addresses (leaving out the "call" instruction on each one), and used a small interpreter (later called a virtual machine) to call each subroutine in turn. This is called DTC (Direct Threaded Code).
Charles Moore invented an even more compact notation in 1970 for his Forth virtual machine: ITC (indirect threaded code). Originally, Moore invented this because it was easy and fast on NOVA minicomputers, which have an indirection bit in every address. He said (in published remarks, Byte Magazine's Forth Issue) that he found it so convenient that he propagated it into all later Forth designs.
Threading models
Common models of threaded code are:
- Indirect threading: The list of addresses in code point to an address at the start of a data area. The address at the start of the data area points at machine code to use the data. This "extra" pointer in indirect threading serves as an "executable data type" to define custom interpreters for data. The address at the start of each data area points to the code shared by all data areas of that type. Though somewhat slower than other models, the sharing of data-management code can save a lot of space, and no type interpretation is usually needed. Most Forth systems produce indirect-threaded code.
- Direct threading: The addresses in the code are actually the address of machine language. This is a compromise between speed and space. The indirect data pointer is lost, at some loss in the language's flexibility, and this may need to be corrected by a type tag in the data areas, with an auxiliary table. Some Forth systems have produced direct-threaded code. On many machines direct-threading is faster than subroutine threading (see reference below).
- Subroutine threaded code, in which the code is actually a list of subroutine calls. This is an obvious implementation method. Early compilers for ALGOL, Fortran, Cobol and some Forth systems often produced subroutine-threaded code. The code in many of these systems operated on a first-in-first-out stack of operands, which had well-developed compiler theory. Many people believe that this implementation trades space for speed, since subroutine instructions are available in most computer designs. However, in many computer systems, indexed fetches are faster than subroutine calls, and in these computers, direct threading can be faster than subroutine threading.
- Token Threaded Code uses lists of 8 or 12-bit indexes to a table of pointers. Token threaded code is notably compact, without much special effort by a programmer. It is usually half to three-fourths the size of other threaded-codes, which are themselves a quarter to an eighth the size of compiled code. The table's pointers can either be indirect or direct. Some Forth compilers produce token threaded code. Some programmers consider the byte codes used by .NET, Java, Basic and some C compilers to be token-threading.
- Huffman Threaded Code consists of lists of Huffman codes. A Huffman code is a variable length bit string used to identify a unique item. A Huffman-threaded interpreter locates subroutines using an index table or tree of pointers that can be navigated by the Huffman code. Huffman threaded code is one of the most compact representations known for a computer program. Basically the index and codes are organized by measuring the frequency that each subroutine occurs in the code. Frequent calls are given the shortest codes. Operations with approximately equal frequencies are given codes with nearly equal bit-lengths. Most Huffman-threaded systems have been implemented as direct-threaded Forth systems, and used to pack large amounts of slow-running code into small, cheap microcontrollers. Most published uses have been in toys, calculators or watches.
Less often used are
- String threading: In which operations are identified by strings, usually looked-up by a hash table. This was used in Charles Moore's earliest Forth implementations, and also in the University of Illinois's experimental hardware-interpreted computer language.
- return threading
- call threading
Common amenities
Separating the data and return stacks in a machine eliminates a great deal of stack management code, substantially reducing the size of the threaded code. The dual-stack principle was originated three times independently: for Burroughs computers, Forth and Postscript), and is used in some Java virtual machines.
Three registers are often present in a threaded virtual machine. Another one exists for passing data between subroutines ('words'). These are:
- ip or i (instruction pointer)
- w (word pointer)
- rp or r (return stack pointer)
- sp or s (parameter stack pointer for passing parameters between words)
Often, threaded virtual machines such as implementations of the programming language Forth have a simple virtual machine at heart, consisting of three primitives. Those are:
- nest, also called docol
- unnest, or semi_s (;s)
- next
In an indirect-threaded virtual machine, the one given here, the operations are:
next: (ip)+ -> w ; jmp (w)+ nest: ip -> -(rp) ; w -> ip ; next unnest: (rp)+ -> ip ; next
External links
- The Development of the C Language (http://cm.bell-labs.com/cm/cs/who/dmr/chist.html) by Dennis M. Ritchie also puts B in the context of BCPL and C.
- Anton Ertl's explanatory page "What is Threaded Code?" (http://www.complang.tuwien.ac.at/forth/threaded-code.html) describes different threading techniques and provides further references. "Charles Moore invented (indirect) threaded Code in 1970"