[Beowulf] AMD64 results...
josip at lanl.gov
Thu Dec 16 08:17:06 PST 2004
Robert G. Brown wrote:
> [...] One can see how having 64 bits would really
> speed up 64 bit division compared to doing it in software across
> multiple 32 bit registers...
Correct me if I'm wrong, but doesn't the floating point unit normally
use an internal iterative process to perform the division? This would
not involve 32-bit registers...
I'm not so sure about *integer* 64-bit division. Integer division may
involve multiple 32-bit integer registers.
Good ole' Cray-1 used an iterative process for floating point division
which worked like this: given a floating point number x, use the first 8
bits of the mantissa to index into a lookup table containing initial
guesses, then do a few steps of Newton-Raphson iteration involving only
multiply-add operations to get the fully converged reciprocal mantissa,
fix the exponent, thus obtaining 1/x, then multiply y*(1/x) to get y/x.
As I recall, the famous Pentium FDIV bug involved some corner cases in a
similar iterative process, all of which is internal to the floating
point unit. Moreover, in addition to following the 32/64-bit IEEE 754
standard for floating point arithmetic, some implementations (e.g.
Pentium, Opteron) support x87 legacy internal 80-bit representations of
floating point numbers, which can really help when accumulating long
sums and computing square roots, etc. Prof. Kahane has numerous
arguments in favor of this internal 80-bit representation...
More information about the Beowulf