[ << ]  [ < ]  [ Up ]  [ > ]  [ >> ]  [Top]  [Contents]  [Index]  [ ? ] 
3.2 Floatingpoint numbers
Not all real numbers can be represented exactly. (There is an easy mathematical proof for this: Only a countable set of numbers can be stored exactly in a computer, even if one assumes that it has unlimited storage. But there are uncountably many real numbers.) So some approximation is needed. CLN implements ordinary floatingpoint numbers, with mantissa and exponent.
The elementary operations (+
, 
, *
, /
, …)
only return approximate results. For example, the value of the expression
(cl_F) 0.3 + (cl_F) 0.4
prints as ‘0.70000005’, not as
‘0.7’. Rounding errors like this one are inevitable when computing
with floatingpoint numbers.
Nevertheless, CLN rounds the floatingpoint results of the operations +
,

, *
, /
, sqrt
according to the “roundtoeven”
rule: It first computes the exact mathematical result and then returns the
floatingpoint number which is nearest to this. If two floatingpoint numbers
are equally distant from the ideal result, the one with a 0
in its least
significant mantissa bit is chosen.
Similarly, testing floating point numbers for equality ‘x == y’
is gambling with random errors. Better check for ‘abs(x  y) < epsilon’
for some wellchosen epsilon
.
Floating point numbers come in four flavors:

Short floats, type
cl_SF
. They have 1 sign bit, 8 exponent bits (including the exponent’s sign), and 17 mantissa bits (including the “hidden” bit). They don’t consume heap allocation. 
Single floats, type
cl_FF
. They have 1 sign bit, 8 exponent bits (including the exponent’s sign), and 24 mantissa bits (including the “hidden” bit). In CLN, they are represented as IEEE singleprecision floating point numbers. This corresponds closely to the C/C++ type ‘float’. 
Double floats, type
cl_DF
. They have 1 sign bit, 11 exponent bits (including the exponent’s sign), and 53 mantissa bits (including the “hidden” bit). In CLN, they are represented as IEEE doubleprecision floating point numbers. This corresponds closely to the C/C++ type ‘double’. 
Long floats, type
cl_LF
. They have 1 sign bit, 32 exponent bits (including the exponent’s sign), and n mantissa bits (including the “hidden” bit), where n >= 64. The precision of a long float is unlimited, but once created, a long float has a fixed precision. (No “lazy recomputation”.)
Of course, computations with long floats are more expensive than those with smaller floatingpoint formats.
CLN does not implement features like NaNs, denormalized numbers and gradual underflow. If the exponent range of some floatingpoint type is too limited for your application, choose another floatingpoint type with larger exponent range.
As a user of CLN, you can forget about the differences between the
four floatingpoint types and just declare all your floatingpoint
variables as being of type cl_F
. This has the advantage that
when you change the precision of some computation (say, from cl_DF
to cl_LF
), you don’t have to change the code, only the precision
of the initial values. Also, many transcendental functions have been
declared as returning a cl_F
when the argument is a cl_F
,
but such declarations are missing for the types cl_SF
, cl_FF
,
cl_DF
, cl_LF
. (Such declarations would be wrong if
the floating point contagion rule happened to change in the future.)
[ << ]  [ < ]  [ Up ]  [ > ]  [ >> ]  [Top]  [Contents]  [Index]  [ ? ] 
This document was generated on August 27, 2013 using texi2html 5.0.