IEEE 754r
|
IEEE 754r is an ongoing revision to the IEEE 754 floating point standard. The intent of the revision is to extend the standard where it has become necessary, to tighten up certain areas of the original standard which were left undefined, and to merge in IEEE 854 (the radix-independent floating-point standard).
Where stricter definitions are performance-incompatible with some existing implementation, they are placed in a new section, allowing two levels of implementation.
Contents |
Revision process
The standard has been under revision since 2000, with a target completion date of December 2005. Participation is open to people with a solid knowledge of floating-point arithmetic. Monthly meetings are held in Silicon Valley. The mailing list reflects ongoing discussions.
Summary of the revisions
The most obvious enhancements to the standard are the addition of 128-bit and decimal formats, and some new operations, however there have been significant clarifications in terminology throughout. This summary highlights the major differences in each major section of the standard. Note that the revision is not yet an approved standard—so all these changes are, in effect, proposals.
Scope
The scope has been widened to include decimal formats and arithmetic.
Definitions
Many of the definitions have been rewritten for clarification and consistency. A few terms have been renamed for clarity (for example, denormalized has been renamed to subnormal).
Formats
The specification levels of a floating-point format have been enumerated, to clarify the distinction between
- the theoretical real numbers (a number line)
- the entities which can be represented in the format (a finite set of numbers, together with −0, infinities, and NaN)
- the particular representations of the entities: sign-exponent-significand, etc.
- the bit-pattern (encoding) used.
The sets of representable entities are then explained in detail, showing that they can be treated with the significand being considered either as a fraction or an integer.
The basic binary formats have the "quad precision" (128-bit) format added.
Three new decimal formats are described, matching the lengths of the binary formats. These give decimal formats with 7, 16, and 34-digit significands, which may be normalized or unnormalized. For maximum range and precision, the formats merge part of the exponent and significand into a combination field, and compress the remainder of the significand using densely packed decimal encoding.
Rounding
The round-to-nearest, ties away from zero rounding mode has been added (required for decimal operations only).
Operations
This section has numerous clarifications (notably in the area of comparisons), several previously recommended operations (quiet copy, negate, abs, and copysign) are now required.
New operations include Fused multiply-add (FMA), classification predicates (isnan(x), etc.), various min and max functions (which allow a total ordering), and two decimal-specific operations (samequantum and quantize).
min and max
The min and max operations are defined in such a way that they are commutative (except for the case of two NaNs as inputs). In particular:
-
min(+0,-0) = min(-0,+0) = -0
-
max(+0,-0) = max(-0,+0) = +0
In order to support operations such as windowing in which a NaN input should be quietly replaced with one of the end points, min and max are defined to select a number, x, in preference to a quiet NaN:
-
min(x,NaN) = min(NaN,x) = x
-
max(x,NaN) = max(NaN,x) = x
In the current draft, these functions are called minnum and maxnum to indicate their preference for a number over a NaN.
Decimal arithmetic
Decimal arithmetic, compatible with that used in Java, C#, PL/I, COBOL, REXX, etc., is also defined in this section.
Correctly-rounded base conversion
Unlike in 854, 754r will require correctly rounded base conversion between decimal and binary floating point within range which depends on the format.
Sections 6–8
These sections have been revised, but with no major additions; some aspects remain under discussion.
Non-compatible extensions
This new section defines a second level of conformance to the standard, which specifies extensions compatible with the IEEE 754 standard but which could cause significant performance degradation for existing implementations in some circumstances. Until it gets a section number, it is referred to as section "N".
These include:
- extended rounding position
- a definition of the payload for NaNs, and how NaN payloads are propagated
- the binary encodings of NaNs
- extended operations on NaNs.
- narrowing of the definition of undeflow
Underflow
In 754 the definition of underflow was that the result is tiny and encounters a loss of accuracy. Two definitions were allowed for the determination of the 'tiny' condition: before or after rounding the infinitely precise result to working precision, with unbounded exponent. Two definitions of loss of accuracy were permitted: inexact result or loss due only to denormalization. No known hardware systems implemented the latter.
In the higher conformance level of 754r, it is being proposed that only tininess after rounding and inexact as loss of accuracy be a cause for underflow signal.
Annexes
There are several changes in the annexes; e.g., the traps mechanism has been moved to an annex. Traps were not required by IEEE 754-1985, however many readers of the standard assumed they were. In the revision, there is an attempt to focus more on the functionality that an implementation should provide for dealing with exceptional cases. While traps are one way to implement these features, there are other approaches.
Annex "D" provides guidance to debugger developers for features that are desired for supporting debug of floating point code.
Annex "Z" introduces optional datatypes for supporting other fixed width floating point formats, as well as arbitrary precision formats (i.e., where the precision of representation and at rounding is determined at execution time) – though this material is being moved into the body of the draft.
New Annexes are currently (2004) under discussion.
Open areas of current work
- Expression evaluation rules and modes for selecting sets of rules.
- Inheritance and propagation of modes (exception handling, presubstitution, rounding) and flags (inexact, underflow, overflow, divide by zero, invalid). The desire is to have mode changes be able to be inherited by a callee, but not affect the caller. And have the flags propagate out to a caller.
- Are signaling NaNs used enough to warrant continuing to require them?
- Recommendations to language developers of how to bind items in the standard to features in a language: IEEE 754r/Annex L.
- Defining an interchange format for IEEE 754r/Annex Z.
External links
- Committee working page: IEEE 754: Standard for Binary Floating-Point Arithmetic (http://grouper.ieee.org/groups/754/)
- Current draft Some Proposals for Revising ANSI/IEEE Std 754-1985 (http://754r.ucbtest.org/)
- Densely Packed Decimal (http://www2.hursley.ibm.com/decimal/DPDecimal.html)
- Prof. Kahan's paper on How Futile are Mindless Assesments of Roundoff in Floating-Point Computation (http://www.cs.berkeley.edu/~wkahan/Mindless.pdf)
- ISO Language Independent Arithmetic Standard (http://www.open-std.org/JTC1/SC22/WG11)
- RFC 1832 - XDR: (http://www.faqs.org/rfcs/rfc1832.html) External Data Representation RFC