IBM Floating Point Architecture
|
IBM System/360 computers, and subsequent machines based on that architecture (mainframes), support a hexadecimal floating-point format. The format is used by SAS Transport files as required by the FDA for New Drug Application (NDA) study submissions. See TS-140 [1] (http://support.sas.com/techsup/technote/ts140.html). It is also used in GRIB data files to exchange the output of weather prediction models.
Hexadecimal floating-point uses a similar approach to IEEE 754 binary floating-point, but with many differences. The significand is longer, and the exponent is shorter.
Single-precision 32 bit
A single-precision binary floating-point number is stored in a 32 bit word:
1 7 24 width in bits +-+-------+------------------------+ |S| Exp | Fraction | +-+-------+------------------------+ 31 30 24 23 0 bit index (0 on right) bias +64
Note that in this format the initial bit is not suppressed, and the radix point is set to the left of the mantissa in increments of 4 bits.
An Example
Let us decode the number −118.625 using the IBM floating point system.
We need to get the sign, the exponent and the fraction.
Because it is a negative number, the sign is "1". Let's find the others.
First, we write the number (without the sign) using binary notation. Look at binary numeral system to see how to do it. The result is 1110110.101
Now, let's move the radix point left, moving four bits at a time (because exponents are to the power of 16, not 2): 1110110.101=.01110110101·162
The fraction is the part at the right of the radix point, filled with 0 on the right until we get all 24 bits. That is 011101101010000000000000.
The exponent is 2, but we need to convert it to binary and bias it (so the most negative exponent is 0, and all exponents are non-negative binary numbers). For the system/360 format, the bias is 64 and so 2 + 64 = 66. In binary, this is written as 100010.
Putting them all together:
1 7 24 width in bits +-+-------+------------------------+ |S| Exp | Fraction | |1|1000010|011101101010000000000000| +-+-------+------------------------+ 31 30 24 23 0 bit index (0 on right) bias +64
Double-precision 64 bit
Double-precision is the same except that the mantissa (fraction) field is wider:
1 7 56 bits +-+-----------+----------------------------------------------------+ |S| Exp | Fraction | +-+-----------+----------------------------------------------------+ 63 62 56 55 0
The bias is 64 because the exponent is to the power of 16. Even though the base is 16, the exponent in this form is slightly smaller than the equivalent in IEEE 754.
See, for example: Schwarz, CMOS floating-point unit for the S/390 Parallel Enterprise Server G4 (http://www.research.ibm.com/journal/rd/414/schwarz.html)
Since 1998, IBM mainframes have also included binary floating-point units which conform to IEEE 754. When decimal floating-point is added later this decade (200x), each will have at least three floating-point units.