[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |

### 15.8.6 Floating Point

Floating point arithmetic is used in GMP for multiplications on CPUs with poor
integer multipliers. It’s mostly useful for `mpn_mul_1`

,
`mpn_addmul_1`

and `mpn_submul_1`

on 64-bit machines, and
`mpn_mul_basecase`

on both 32-bit and 64-bit machines.

With IEEE 53-bit double precision floats, integer multiplications producing up
to 53 bits will give exact results. Breaking a 64x64 multiplication
into eight 16x*32->48* bit pieces is convenient. With
some care though six 21x*32->53* bit products can be
used, if one of the lower two 21-bit pieces also uses the sign bit.

For the `mpn_mul_1`

family of functions on a 64-bit machine, the
invariant single limb is split at the start, into 3 or 4 pieces. Inside the
loop, the bignum operand is split into 32-bit pieces. Fast conversion of
these unsigned 32-bit pieces to floating point is highly machine-dependent.
In some cases, reading the data into the integer unit, zero-extending to
64-bits, then transferring to the floating point unit back via memory is the
only option.

Converting partial products back to 64-bit limbs is usually best done as a
signed conversion. Since all values are smaller than *2^53*, signed
and unsigned are the same, but most processors lack unsigned conversions.

Here is a diagram showing 16x32 bit products for an `mpn_mul_1`

or
`mpn_addmul_1`

with a 64-bit limb. The single limb operand V is split
into four 16-bit parts. The multi-limb operand U is split in the loop into
two 32-bit parts.

+---+---+---+---+ |v48|v32|v16|v00| V operand +---+---+---+---+ +-------+---+---+ x | u32 | u00 | U operand (one limb) +---------------+ --------------------------------- +-----------+ | u00 x v00 | p00 48-bit products +-----------+ +-----------+ | u00 x v16 | p16 +-----------+ +-----------+ | u00 x v32 | p32 +-----------+ +-----------+ | u00 x v48 | p48 +-----------+ +-----------+ | u32 x v00 | r32 +-----------+ +-----------+ | u32 x v16 | r48 +-----------+ +-----------+ | u32 x v32 | r64 +-----------+ +-----------+ | u32 x v48 | r80 +-----------+

*p32* and *r32* can be summed using floating-point addition, and
likewise *p48* and *r48*. *p00* and *p16* can be summed
with *r64* and *r80* from the previous iteration.

For each loop then, four 49-bit quantities are transferred to the integer unit, aligned as follows,

|-----64bits----|-----64bits----| +------------+ | p00 + r64' | i00 +------------+ +------------+ | p16 + r80' | i16 +------------+ +------------+ | p32 + r32 | i32 +------------+ +------------+ | p48 + r48 | i48 +------------+

The challenge then is to sum these efficiently and add in a carry limb,
generating a low 64-bit result limb and a high 33-bit carry limb (*i48*
extends 33 bits into the high half).

[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |

This document was generated on *March 31, 2014* using *texi2html 5.0*.