Of course I forgot the code... ;-)<br><br><div><span class="gmail_quote">2006/11/22, Ivan Paganini <<a href="mailto:ispmarin@gmail.com">ispmarin@gmail.com</a>>:</span><blockquote class="gmail_quote" style="margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; margin-left: 0.80ex; border-left-color: #cccccc; border-left-width: 1px; border-left-style: solid; padding-left: 1ex">

Thank you Jonh and Vincent for the answers! <br>Well, I did a little more research, and tested a very simple code (anexed). Ran ldd on it, and got <br>+++++++++<br>asgard@marvin:~$ ldd a.out <br>        libc.so.6 => /lib/libc.so.6 (0x00002b630e4d5000) 

<br>        /lib64/ld-linux-x86-64.so.2 (0x00002b630e3b8000)<br>+++++++++<br>So I think that my gcc is using 64bits libraries.<br><br>My question now points to other direction: the SSE/SSE2 registers can support, as Vincent said, 2 x 64bits float or 2 x 64bits integer. Forgeting about the SIMD instructions (not using vectorization at any level yet), what in the 32bits processor uses two registers (double, 64bits), in amd64 uses just one register (of half of 128bit register, for 64bits float), right?  Is gcc already using this amd64 registers? Or PGI compilers, or sun express compilers? To be more clear, what I am expecting is that, if I need 10 digits, in an  

<br> 32bits I would need 2 registers, and so more slow code (two  cicles per instruction, etc), in 64bits I would need only one <br>register, and so my code will be faster. <br><br>And we, scientists, are worried about this roundoff errors ;-) 

<br><br>Thank you!<br><br>Ivan<br><br><br><br><br><br><div><span class="gmail_quote">2006/11/22, Vincent Diepeveen <<a href="mailto:diep@xs4all.nl" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

diep@xs4all.nl</a>>:</span><div><span class="e" id="q_10f102364233a48d_1"><blockquote class="gmail_quote" style="margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; margin-left: 0.80ex; border-left-color: #cccccc; border-left-width: 1px; border-left-style: solid; padding-left: 1ex">

<div bgcolor="#ffffff"><div><font face="Arial" size="2">hi Ivan,</font></div><div> </div><div><font face="Arial" size="2">there is for your amd64 hardware different compiler  versions of gcc. One that is 32 bits and one that is 64 bits. 

</font></div><div><font face="Arial" size="2">For 64 bits doubles it doesn't matter  however which of the both you use, yet i'd advice going for the 64 bits  version.</font></div><div> </div><div><font face="Arial" size="2">

 To define variables just use:</font></div><div><font face="Arial" size="2">  "double"</font></div><div> </div><div><font face="Arial" size="2">double is 64 bits in total when using gcc.  intel c++ sometimes uses less but 

</font></div><div><font face="Arial" size="2">i guess not at x64. that kind of cheating happens  more for specfp and chips that lack certain important instructions, such as  itanium lacks a lot.</font></div><div> </div><div>

<font face="Arial" size="2">mantissa in double is 52 bits of the precision  of total length of number.</font></div><div><font face="Arial" size="2">then 1 bit for sign and the other 11 for  exponent.</font></div><div> </div>

<div><font face="Arial" size="2">the 128 bits SSE/SSE2 splits itself up into vector  registers.</font></div><div><font face="Arial" size="2">So either 2 independant unrelated doubles of  64 bits, 4 floats of 32 bits, or 4 integers of 32 bits or 2 integers of 64  bits. 

</font></div><div> </div><div><font face="Arial" size="2">if you have no professor degree spaghetti  programming, then better stay away from SIMD (SSE/SSE2) and let the compiler  automatically generate it for you.</font></div>

<div> </div><div><font face="Arial" size="2">You like to compile at AMD hardware with the  additional flags (additional to the ones you want for  SIMD):</font></div><div> </div><div><font face="Arial" size="2">gcc -O2 -march=k8 -mtune=k8 -o  myprogram 

</font></div><div> </div><div><font face="Arial" size="2">-O3 and higher potentially buggy for most software  i try, but if you have a deterministic way to test executables you can of course  try it out. </font></div><div>

  </div><div><font face="Arial" size="2">Let's not complain about those bugs too loud.  GCC team is happy amateurs, like all gnu folks never having followed any course  about QAD (quality assurance testing) which is most important thing of  a product; test until it is reliable and bugfree working. However QAD and  testing is for professionals rather than amateurs, we understand that very  well, and instead cheer for having the privilege to work with this  free available software. 

</font></div><div> </div><div><font face="Arial" size="2">As it seems the gcc 4.1 release was a snapshot  lobotomized deliberately for AMD, pgo for most software no longer helps  much at AMD hardware with gcc. Earlier 4.1

  snapshots were at least for my  integer code a lot faster with pgo (using -O3 in the speedtests by the way). So  not worth the trouble trying to get pgo working, as you need to check it for  deterministic output too; which with  

</font><font face="Arial" size="2">floating  point is quite hard i assume.</font></div><div> </div><div><font face="Arial" size="2">Some tricks indeed get rid of a few digits  significance in compilers, or cause roundoff errors to sooner backtrack in the  endresult (which nearly no scientist ever notices, saying more about the  scientist than the different compilers) 

</font><font face="Arial" size="2">, yet when  using above tips that shouldn't happen too quickly and in doubles you should  have roughly 52 * log(2) / log(10) = 52 / 3.322 = 15 digits or so</font></div><div> </div><div><font face="Arial" size="2">

 Good Luck!</font></div><div> </div><div><font face="Arial" size="2">Vincent</font></div><blockquote style="padding-right: 0px; padding-left: 5px; margin-left: 5px; border-left-color: #000000; border-left-width: 2px; border-left-style: solid; margin-right: 0px">

<div><span><div style="font-style: normal; font-variant: normal; font-weight: 400; font-size: 10pt; line-height: normal">----- Original Message -----  </div><div style="background-color: #e4e4e4; font-style: normal; font-variant: normal; font-weight: 400; font-size: 10pt; line-height: normal">

<b>From:</b>    <a title="ispmarin@gmail.com" href="mailto:ispmarin@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">Ivan Paganini</a>    </div><div style="font-style: normal; font-variant: normal; font-weight: 400; font-size: 10pt; line-height: normal">

<b>To:</b> <a title="beowulf@beowulf.org" href="mailto:beowulf@beowulf.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">beowulf@beowulf.org</a> </div><div style="font-style: normal; font-variant: normal; font-weight: 400; font-size: 10pt; line-height: normal">

<b>Sent:</b> Wednesday, November 22, 2006 12:01    PM</div><div style="font-style: normal; font-variant: normal; font-weight: 400; font-size: 10pt; line-height: normal"><b>Subject:</b> [Beowulf] Question about amd64    architecture and floating pointoperations

</div><div><br></div>Hello everybody at beowulf. Sorry about the _really_    newbie question, but after doing some tests and researching a little, a    question arose when fooling around with amd64 (more precisely, an amd64 Athlon    4200 X2) and gcc and sun studio 11. The architecture has 64 bits integer    registers and 128 bits floating point registers, but my test programs in C    just gave me the same precision that I got with an old athlon 2400 xp    (32bits), that is, long double go only to 1x10^ 4961, even with the -m64 flag.    I always imagined that I would get the double precision without the long    double declaration (or, maybe, 40bits precision). What am I missing here? Is    the compiler (gcc  

4.1, sun studio express 11), the operating system (ubuntu    64bits edgy), or just an error in my logic?<br><br>Thank you for the    patience!<br clear="all"><br>--    <br>-----------------------------------------------------------  

<br>Ivan S. P.    Marin<br>Laboratório de Física Computacional<br><a href="http://lfc.ifsc.usp.br" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">lfc.ifsc.usp.br</a><br>Instituto de Física de    São Carlos - USP 

<br>----------------------------------------------------------     </span></div><p><hr>_______________________________________________<br>Beowulf mailing    list, <a href="mailto:Beowulf@beowulf.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

 Beowulf@beowulf.org</a><br>To change your subscription (digest mode or    unsubscribe) visit  <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

 http://www.beowulf.org/mailman/listinfo/beowulf</a><br></p></blockquote></div></blockquote></span></div></div><br><br clear="all"><br>-- <br>-----------------------------------------------------------<div><span class="e" id="q_10f102364233a48d_3">

<br>Ivan S. P. Marin<br>Laboratório de Física Computacional <br><a href="http://lfc.ifsc.usp.br" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">lfc.ifsc.usp.br</a><br>Instituto de Física de São Carlos - USP

<br>----------------------------------------------------------   </span></div></blockquote></div><br><br clear="all"><br>-- <br>-----------------------------------------------------------<br>Ivan S. P. Marin<br>Laboratório de Física Computacional

<br><a href="http://lfc.ifsc.usp.br">lfc.ifsc.usp.br</a><br>Instituto de Física de São Carlos - USP<br>----------------------------------------------------------