[Beowulf] Question about amd64 architecture and floating pointoperations

Wed Nov 22 06:49:05 PST 2006

Of course I forgot the code... ;-)

2006/11/22, Ivan Paganini <ispmarin at gmail.com>:
>
> Thank you Jonh and Vincent for the answers!
> Well, I did a little more research, and tested a very simple code
> (anexed). Ran ldd on it, and got
> +++++++++
> asgard at marvin:~$ ldd a.out
> libc.so.6 => /lib/libc.so.6 (0x00002b630e4d5000)
> /lib64/ld-linux-x86-64.so.2 (0x00002b630e3b8000)
> +++++++++
> So I think that my gcc is using 64bits libraries.
>
> My question now points to other direction: the SSE/SSE2 registers can
> support, as Vincent said, 2 x 64bits float or 2 x 64bits
> integer. Forgeting about the SIMD instructions (not using vectorization at
> any level yet), what in the 32bits processor uses two registers (double,
> 64bits), in amd64 uses just one register (of half of 128bit register, for
> 64bits float), right?  Is gcc already using this amd64
> registers? Or PGI compilers, or sun express compilers? To be more clear, what I am expecting is that, if I need 10 digits, in an
>
> 32bits I would need 2 registers, and so more slow code (two cicles per
> instruction, etc), in 64bits I would need only one
> register, and so my code will be faster.
>
> And we, scientists, are worried about this roundoff errors ;-)
>
> Thank you!
>
> Ivan
>
>
>
>
>
> 2006/11/22, Vincent Diepeveen <diep at xs4all.nl>:
> >
> > hi Ivan,
> >
> > there is for your amd64 hardware different compiler versions of gcc. One
> > that is 32 bits and one that is 64 bits.
> > For 64 bits doubles it doesn't matter however which of the both you use,
> > yet i'd advice going for the 64 bits version.
> >
> > To define variables just use:
> >   "double"
> >
> > double is 64 bits in total when using gcc. intel c++ sometimes uses less
> > but
> > i guess not at x64. that kind of cheating happens more for specfp and
> > chips that lack certain important instructions, such as itanium lacks a lot.
> >
> > mantissa in double is 52 bits of the precision of total length of
> > number.
> > then 1 bit for sign and the other 11 for exponent.
> >
> > the 128 bits SSE/SSE2 splits itself up into vector registers.
> > So either 2 independant unrelated doubles of 64 bits, 4 floats of 32
> > bits, or 4 integers of 32 bits or 2 integers of 64 bits.
> >
> > if you have no professor degree spaghetti programming, then better stay
> > away from SIMD (SSE/SSE2) and let the compiler automatically generate it for
> > you.
> >
> > You like to compile at AMD hardware with the additional flags
> > (additional to the ones you want for SIMD):
> >
> > gcc -O2 -march=k8 -mtune=k8 -o myprogram
> >
> > -O3 and higher potentially buggy for most software i try, but if you
> > have a deterministic way to test executables you can of course try it out.
> >
> > Let's not complain about those bugs too loud. GCC team is happy
> > amateurs, like all gnu folks never having followed any course about QAD
> > (quality assurance testing) which is most important thing of a product; test
> > until it is reliable and bugfree working. However QAD and testing is for
> > professionals rather than amateurs, we understand that very well, and
> > instead cheer for having the privilege to work with this free available
> > software.
> >
> > As it seems the gcc 4.1 release was a snapshot lobotomized deliberately
> > for AMD, pgo for most software no longer helps much at AMD hardware with
> > gcc. Earlier 4.1 snapshots were at least for my integer code a lot
> > faster with pgo (using -O3 in the speedtests by the way). So not worth the
> > trouble trying to get pgo working, as you need to check it for deterministic
> > output too; which with floating point is quite hard i assume.
> >
> > Some tricks indeed get rid of a few digits significance in compilers, or
> > cause roundoff errors to sooner backtrack in the endresult (which nearly no
> > scientist ever notices, saying more about the scientist than the different
> > compilers) , yet when using above tips that shouldn't happen too quickly
> > and in doubles you should have roughly 52 * log(2) / log(10) = 52 /
> > 3.322 = 15 digits or so
> >
> > Good Luck!
> >
> > Vincent
> >
> > ----- Original Message -----
> > *From:* Ivan Paganini <ispmarin at gmail.com>
> > *To:* beowulf at beowulf.org
> > *Sent:* Wednesday, November 22, 2006 12:01 PM
> > *Subject:* [Beowulf] Question about amd64 architecture and floating
> > pointoperations
> >
> > Hello everybody at beowulf. Sorry about the _really_ newbie question,
> > but after doing some tests and researching a little, a question arose when
> > fooling around with amd64 (more precisely, an amd64 Athlon 4200 X2) and gcc
> > and sun studio 11. The architecture has 64 bits integer registers and 128
> > bits floating point registers, but my test programs in C just gave me the
> > same precision that I got with an old athlon 2400 xp (32bits), that is, long
> > double go only to 1x10^ 4961, even with the -m64 flag. I always imagined
> > that I would get the double precision without the long double declaration
> > (or, maybe, 40bits precision). What am I missing here? Is the compiler (gcc
> > 4.1, sun studio express 11), the operating system (ubuntu 64bits edgy),
> > or just an error in my logic?
> >
> > Thank you for the patience!
> >
> > --
> > -----------------------------------------------------------
> > Ivan S. P. Marin
> > Laboratório de Física Computacional
> > lfc.ifsc.usp.br
> > Instituto de Física de São Carlos - USP
> > ----------------------------------------------------------
> >
> > ------------------------------
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
>
>
> --
> -----------------------------------------------------------
> Ivan S. P. Marin
> Laboratório de Física Computacional
> lfc.ifsc.usp.br
> Instituto de Física de São Carlos - USP
> ----------------------------------------------------------
>

-- 
-----------------------------------------------------------
Ivan S. P. Marin
Laboratório de Física Computacional
lfc.ifsc.usp.br
Instituto de Física de São Carlos - USP
----------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20061122/5947f3c3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.c
Type: application/octet-stream
Size: 284 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20061122/5947f3c3/attachment.obj>