4.10 Functions on floatingpoint numbers
Recall that a floatingpoint number consists of a sign s
, an
exponent e
and a mantissa m
. The value of the number is
(1)^s * 2^e * m
.
Each of the classes
cl_F
, cl_SF
, cl_FF
, cl_DF
, cl_LF
defines the following operations.
type scale_float (const type& x, sintC delta)
type scale_float (const type& x, const cl_I& delta)
Returns
x*2^delta
. This is more efficient than an explicit multiplication because it copiesx
and modifies the exponent.
The following functions provide an abstract interface to the underlying representation of floatingpoint numbers.
sintE float_exponent (const type& x)

Returns the exponent
e
ofx
. Forx = 0.0
, this is 0. Forx
nonzero, this is the unique integer with2^(e1) <= abs(x) < 2^e
. sintL float_radix (const type& x)

Returns the base of the floatingpoint representation. This is always
2
. type float_sign (const type& x)

Returns the sign
s
ofx
as a float. The value is 1 forx
>= 0, 1 forx
< 0. uintC float_digits (const type& x)

Returns the number of mantissa bits in the floatingpoint representation of
x
, including the hidden bit. The value only depends on the type ofx
, not on its value. uintC float_precision (const type& x)

Returns the number of significant mantissa bits in the floatingpoint representation of
x
. Since denormalized numbers are not supported, this is the same asfloat_digits(x)
ifx
is nonzero, and 0 ifx
= 0.
The complete internal representation of a float is encoded in the type
decoded_float
(or decoded_sfloat
, decoded_ffloat
,
decoded_dfloat
, decoded_lfloat
, respectively), defined by
struct decoded_typefloat { type mantissa; cl_I exponent; type sign; };
and returned by the function
decoded_typefloat decode_float (const type& x)

For
x
nonzero, this returns(1)^s
,e
,m
withx = (1)^s * 2^e * m
and0.5 <= m < 1.0
. Forx
= 0, it returns(1)^s
=1,e
=0,m
=0.e
is the same as returned by the functionfloat_exponent
.
A complete decoding in terms of integers is provided as type
struct cl_idecoded_float { cl_I mantissa; cl_I exponent; cl_I sign; };
by the following function:
cl_idecoded_float integer_decode_float (const type& x)

For
x
nonzero, this returns(1)^s
,e
,m
withx = (1)^s * 2^e * m
andm
an integer withfloat_digits(x)
bits. Forx
= 0, it returns(1)^s
=1,e
=0,m
=0. WARNING: The exponente
is not the same as the one returned by the functionsdecode_float
andfloat_exponent
.
Some other function, implemented only for class cl_F
:
cl_F float_sign (const cl_F& x, const cl_F& y)

This returns a floating point number whose precision and absolute value is that of
y
and whose sign is that ofx
. Ifx
is zero, it is treated as positive. Same fory
.
