Fused multiply-add
|
In computing, a fused multiply-add computes a multiply-accumulate
- FMA(A, B, C) = AB + C
with a single rounding of floating point numbers.
When implemented in a microprocessor this is typically faster than a multiply operation followed by an add. It also allows for getting the bottom half of the multiplication. E.g.,
- H = FMA(A, B, 0.0)
- L = FMA(A, B, −H)
This is implemented on the PowerPC and Itanium processor families. Because of this instruction there is no need for a hardware divide or square root unit since they can both be implemented using the FMA in software.
A fast FMA can speed up and improve the accuracy of many computations which involve the accumulation of products:
- Dot product
- Matrix multiplication
- Polynomial evaluation (e.g., with Horner's rule)
The FMA operation will likely be added to IEEE 754 in IEEE 754r.