Reconfigurable computing
|
Reconfigurable computing is computer processing with highly flexible computing fabrics. The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the data path itself in addition to the control flow.
Contents |
History and characteristics
The concept of reconfigurable computing have been around since the 1960s, when Gerald Estrin's landmark paper proposed the concept of a computer consisting of a standard processor and an array of “reconfigurable” hardware. The main processor would control the behavior of the reconfigurable hardware. The reconfigurable hardware would then be tailored made to a specific task, such as image processing or pattern matching, as quickly as a dedicated piece of hardware. Once the task was done, the hardware could be adjusted to do some other task. This resulted in a hybrid computer structure combining the flexibility of software with the speed of hardware; unfortunately this idea was way ahead of its time in terms of electronic technology.
In the last decade there have been a recent renaissance in this area of research with many proposed reconfigurable architectures developed both in industry and academia such as, Matrix, Garp, Elixent, XPP, SiliconHive, Montium, Pleiades, Morphosys. These designs have only really become feasible by the relentless progress of silicon technology, allowing for complex designs to be implemented onto a single chip. Classification of these reconfigurable architectures are still being developed and refined as new architectures are developed; therefore no unifying taxonomy has been suggested to date. However, several reccurring parameters can be used to classify these systems.
Granularity
The granularity of the reconfigurable logic is defined as the size of the smallest functional unit that is addressed by the mapping tools. Low granularity, which can also be known as fine-grained, often implies a greater flexibility when implementing algorithms into the hardware. However, there is a penalty associated with this in terms of increased power, area and delay due to greater quantity of routing required per computation. Fine-grained architectures work at the bit-level manipulation level; whilst medium grained processing elements are better optimised for standard data path applications. One of the drawbacks of medium grained architectures are that they tend to lose some of their utilisation and performance if they need to perform smaller computations than their granularity provides, for example for a one bit add on a four bit wide functional unit would waste three bits. Very coarse-grained architectures are intended for the implementation for algorithms needing word-width data paths. As their logic blocks are optimized for large computations they will perform these operations more quickly and power efficiently than a smaller set of functional units connected together with some interconnect, this is due to the connecting wires are shorter, meaning less wire capacitance and hence faster and lower power designs. The consequence of having larger computational blocks is that when the size of operands may not match the algorithm, thereby causing an inefficient utilisation of resources. Often the type of applications to be run are known in advance allowing the logic, memory and routing resources to be tailored to enhance the performance of the device whilst still providing a certain level of flexibility for future adaptation. Examples of this are domain specific arrays aimed at gaining better performance in terms of power, area, throughput than their more generic finer grained FPGA cousins by reducing their flexibility.
Rate of reconfiguration
Configuration of these reconfigurable systems can happen at deployment time, between execution phases or during execution. In a typical reconfigurable system, a bit stream is used to program the device at deployment time. Fine grained systems by their own nature requires greater configuration time then more coarse-grained architectures due to more elements needing to be addressed and programmed. Therefore more coarse-grained architectures gain from potential lower energy requirements, as less information is transferred and utilised. Intuitively, the slower the rate of reconfiguration the smaller the energy consumption as the associated energy cost of reconfiguration are amortised over a longer period of time. Partial reconfiguration aims to allow part of the device to be reprogrammed while another part is still performing active computation. Partial reconfiguration allows smaller reconfigurable bit streams thus not wasting energy on transmitting redundant information in the bit stream. Compression of the bit stream is possible but careful analysis is to be carried out to determine the energy savings from smaller bit streams is not outweighed by the computation needed to decompress the data.
Host coupling
Often the reconfigurable array is used as a processing accelerator attached to a host processor. The level of coupling determines the type of data transfers, latency, power, throughput and overheads involved when utilising the reconfigurable logic. Some of the most intuitive designs use a peripheral bus to provide a coprocessor like arrangement for the reconfigurable array. However, there have also been implementations where the reconfigurable fabric is much closer to the processor, some are even implemented into the data path, utilising the processor registers. The job of the host processor is able to perform the control functions, configure the logic, schedule data and to provide external interfacing.
Routing/interconnects
The flexibility in reconfigurable devices mainly comes from their routing interconnect. One style of interconnect made popular by FPGAs vendors, Xilinx and Altera are the island style layout, where blocks are arranged in an array with vertical and horizontal routing. A layout with inadequate routing may suffer from poor flexibility and resource utilisation, therefore provides limited performance. If too much interconnect is provide this requires more transistors than necessary, consuming more silicon area, longer wires and having more power consumption.
Tool flow
Generally, tools for configurable computing systems can be split up in two parts, CAD tools for reconfigurable array and compilation tools for CPU. The front-end compiler is an integrated tool, and will generate a structural hardware representation that is input of hardware design flow. Hardware design flow for reconfigurable architecture can be classified by the approach adopted by three main stages of design process: technology mapping, placement algorithm and routing algorithm. The software frameworks differ in the level of the programming language.
Some types of reconfigurable computers are microcoded processors where the microcode is stored in RAM or EEPROM, and changeable on reboot or on the fly. This could be done with the AMD 2900 series bit slice processors (on reboot) and later with FPGAs (on the fly).
Some dataflow processors are implemented using reconfigurable computing.
References
- G. Estrin, "Organization of Computer Systems—The Fixed Plus Variable Structure Computer," Proc. Western Joint Computer Conf., Western Joint Computer Conference, New York, 1960, pp. 33-40.