manpagez: man pages & more
info cln
Home | html | info | man
 [ << ] [ < ] [ Up ] [ > ] [ >> ] [Top] [Contents] [Index] [ ? ]

## 3.2 Floating-point numbers

Not all real numbers can be represented exactly. (There is an easy mathematical proof for this: Only a countable set of numbers can be stored exactly in a computer, even if one assumes that it has unlimited storage. But there are uncountably many real numbers.) So some approximation is needed. CLN implements ordinary floating-point numbers, with mantissa and exponent.

The elementary operations (`+`, `-`, `*`, `/`, …) only return approximate results. For example, the value of the expression `(cl_F) 0.3 + (cl_F) 0.4` prints as ‘0.70000005’, not as ‘0.7’. Rounding errors like this one are inevitable when computing with floating-point numbers.

Nevertheless, CLN rounds the floating-point results of the operations `+`, `-`, `*`, `/`, `sqrt` according to the “round-to-even” rule: It first computes the exact mathematical result and then returns the floating-point number which is nearest to this. If two floating-point numbers are equally distant from the ideal result, the one with a `0` in its least significant mantissa bit is chosen.

Similarly, testing floating point numbers for equality ‘x == y’ is gambling with random errors. Better check for ‘abs(x - y) < epsilon’ for some well-chosen `epsilon`.

Floating point numbers come in four flavors:

• Short floats, type `cl_SF`. They have 1 sign bit, 8 exponent bits (including the exponent’s sign), and 17 mantissa bits (including the “hidden” bit). They don’t consume heap allocation.
• Single floats, type `cl_FF`. They have 1 sign bit, 8 exponent bits (including the exponent’s sign), and 24 mantissa bits (including the “hidden” bit). In CLN, they are represented as IEEE single-precision floating point numbers. This corresponds closely to the C/C++ type ‘float’.
• Double floats, type `cl_DF`. They have 1 sign bit, 11 exponent bits (including the exponent’s sign), and 53 mantissa bits (including the “hidden” bit). In CLN, they are represented as IEEE double-precision floating point numbers. This corresponds closely to the C/C++ type ‘double’.
• Long floats, type `cl_LF`. They have 1 sign bit, 32 exponent bits (including the exponent’s sign), and n mantissa bits (including the “hidden” bit), where n >= 64. The precision of a long float is unlimited, but once created, a long float has a fixed precision. (No “lazy recomputation”.)

Of course, computations with long floats are more expensive than those with smaller floating-point formats.

CLN does not implement features like NaNs, denormalized numbers and gradual underflow. If the exponent range of some floating-point type is too limited for your application, choose another floating-point type with larger exponent range.

As a user of CLN, you can forget about the differences between the four floating-point types and just declare all your floating-point variables as being of type `cl_F`. This has the advantage that when you change the precision of some computation (say, from `cl_DF` to `cl_LF`), you don’t have to change the code, only the precision of the initial values. Also, many transcendental functions have been declared as returning a `cl_F` when the argument is a `cl_F`, but such declarations are missing for the types `cl_SF`, `cl_FF`, `cl_DF`, `cl_LF`. (Such declarations would be wrong if the floating point contagion rule happened to change in the future.)

 [ << ] [ < ] [ Up ] [ > ] [ >> ] [Top] [Contents] [Index] [ ? ]

This document was generated on August 27, 2013 using texi2html 5.0.

```© manpagez.com 2000-2018