[Beowulf] Question about amd64 architecture and floating pointoperations

Ivan Paganini ispmarin at gmail.com
Wed Nov 22 06:48:12 PST 2006


Thank you Jonh and Vincent for the answers!
Well, I did a little more research, and tested a very simple code (anexed).
Ran ldd on it, and got
+++++++++
asgard at marvin:~$ ldd a.out
libc.so.6 => /lib/libc.so.6 (0x00002b630e4d5000)
/lib64/ld-linux-x86-64.so.2 (0x00002b630e3b8000)
+++++++++
So I think that my gcc is using 64bits libraries.

My question now points to other direction: the SSE/SSE2 registers can
support, as Vincent said, 2 x 64bits float or 2 x 64bits
integer. Forgeting about the SIMD instructions (not using vectorization at
any level yet), what in the 32bits processor uses two registers (double,
64bits), in amd64 uses just one register (of half of 128bit register, for
64bits float), right?  Is gcc already using this amd64
registers? Or PGI compilers, or sun express compilers? To be more
clear, what I am expecting is that, if I need 10 digits, in an
32bits I would need 2 registers, and so more slow code (two cicles per
instruction, etc), in 64bits I would need only one
register, and so my code will be faster.

And we, scientists, are worried about this roundoff errors ;-)

Thank you!

Ivan





2006/11/22, Vincent Diepeveen <diep at xs4all.nl>:
>
> hi Ivan,
>
> there is for your amd64 hardware different compiler versions of gcc. One
> that is 32 bits and one that is 64 bits.
> For 64 bits doubles it doesn't matter however which of the both you use,
> yet i'd advice going for the 64 bits version.
>
> To define variables just use:
>   "double"
>
> double is 64 bits in total when using gcc. intel c++ sometimes uses less
> but
> i guess not at x64. that kind of cheating happens more for specfp and
> chips that lack certain important instructions, such as itanium lacks a lot.
>
> mantissa in double is 52 bits of the precision of total length of number.
> then 1 bit for sign and the other 11 for exponent.
>
> the 128 bits SSE/SSE2 splits itself up into vector registers.
> So either 2 independant unrelated doubles of 64 bits, 4 floats of 32 bits,
> or 4 integers of 32 bits or 2 integers of 64 bits.
>
> if you have no professor degree spaghetti programming, then better stay
> away from SIMD (SSE/SSE2) and let the compiler automatically generate it for
> you.
>
> You like to compile at AMD hardware with the additional flags (additional
> to the ones you want for SIMD):
>
> gcc -O2 -march=k8 -mtune=k8 -o myprogram
>
> -O3 and higher potentially buggy for most software i try, but if you have
> a deterministic way to test executables you can of course try it out.
>
> Let's not complain about those bugs too loud. GCC team is happy amateurs,
> like all gnu folks never having followed any course about QAD (quality
> assurance testing) which is most important thing of a product; test until it
> is reliable and bugfree working. However QAD and testing is for
> professionals rather than amateurs, we understand that very well, and
> instead cheer for having the privilege to work with this free available
> software.
>
> As it seems the gcc 4.1 release was a snapshot lobotomized deliberately
> for AMD, pgo for most software no longer helps much at AMD hardware with
> gcc. Earlier 4.1 snapshots were at least for my integer code a lot faster
> with pgo (using -O3 in the speedtests by the way). So not worth the trouble
> trying to get pgo working, as you need to check it for deterministic output
> too; which with floating point is quite hard i assume.
>
> Some tricks indeed get rid of a few digits significance in compilers, or
> cause roundoff errors to sooner backtrack in the endresult (which nearly no
> scientist ever notices, saying more about the scientist than the different
> compilers), yet when using above tips that shouldn't happen too quickly
> and in doubles you should have roughly 52 * log(2) / log(10) = 52 / 3.322= 15 digits or so
>
> Good Luck!
>
> Vincent
>
> ----- Original Message -----
> *From:* Ivan Paganini <ispmarin at gmail.com>
> *To:* beowulf at beowulf.org
> *Sent:* Wednesday, November 22, 2006 12:01 PM
> *Subject:* [Beowulf] Question about amd64 architecture and floating
> pointoperations
>
> Hello everybody at beowulf. Sorry about the _really_ newbie question, but
> after doing some tests and researching a little, a question arose when
> fooling around with amd64 (more precisely, an amd64 Athlon 4200 X2) and gcc
> and sun studio 11. The architecture has 64 bits integer registers and 128
> bits floating point registers, but my test programs in C just gave me the
> same precision that I got with an old athlon 2400 xp (32bits), that is, long
> double go only to 1x10^ 4961, even with the -m64 flag. I always imagined
> that I would get the double precision without the long double declaration
> (or, maybe, 40bits precision). What am I missing here? Is the compiler (gcc
> 4.1, sun studio express 11), the operating system (ubuntu 64bits edgy), or
> just an error in my logic?
>
> Thank you for the patience!
>
> --
> -----------------------------------------------------------
> Ivan S. P. Marin
> Laboratório de Física Computacional
> lfc.ifsc.usp.br
> Instituto de Física de São Carlos - USP
> ----------------------------------------------------------
>
> ------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>


-- 
-----------------------------------------------------------
Ivan S. P. Marin
Laboratório de Física Computacional
lfc.ifsc.usp.br
Instituto de Física de São Carlos - USP
----------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20061122/51e748bd/attachment.html>


More information about the Beowulf mailing list