[Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460

Wed Sep 17 17:34:41 PDT 2008

Vincent Diepeveen wrote:
> Nah,
>
> I guess he's referring to sometimes it's using single precision 
> floating point
> to get something done instead of double precision, and it tends to keep
> sometimes stuff in registers.
>
> That isn't a problem necessarily, but if i remember well floating 
> point state
> could get wiped out when switching to SSE2.
>
> Sometimes you lose your FPU registerset in that case.
>
> Main problem is that there is so many dangerous optimizations possible,
> to speedup testsets, because in itself floating point is real slow to 
> do at hardware,
> from hardware viewpoint seen.
>
> Yet in general last generations of intel compilers that has improved 
> really a lot.
Well, running the same code here is the result discrepancy I got:
FLOPS:
    my code has to do: 7,975,847,125,000 (~8Tflops) ...takes 15minutes 
on 8*2core Opeteron with 32 Gigs-o-RAM (thank you OpenMP ;)

The running times (ran it a _few_ times...but not the statistical 
minimum of 30):
    ICC -> runtime == 689.249  ; summed error == 1651.78
    GCC -> runtime == 1134.404 ; summed error == 0.883501

Compiler Flags:
    icc -xW -openmp -O3 vqOpenMP.c -o vqOpenMP
    gcc -lm -fopenmp -O3 -march=native vqOpenMP.c -o vqOpenMP_GCC

No trickery, no smoky mirrors ;) Just a _huge_ kick ASS k-Means 
parallelized with OpenMP (thank gawd, otherwise it takes hours to run) 
and a rather big database of 1.4 Gigs

... So this is what I meant by floating point errors. Yes, the runtime 
was almost halved by ICC (and this is on an *opteron* based system, Tyan 
VX50). The running time wasn't what I was actually looking for rather 
than precision skew and that's where I fell off my chair.

For the ones itching for a little more specs:

eric at einstein ~ $ icc -V
Intel(R) C Compiler for applications running on Intel(R) 64, Version 
10.1    Build 20080602
Copyright (C) 1985-2008 Intel Corporation.  All rights reserved.
FOR NON-COMMERCIAL USE ONLY

eric at einstein ~ $ gcc -v
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: 
/dev/shm/portage/sys-devel/gcc-4.3.1-r1/work/gcc-4.3.1/configure 
--prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.1 
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include 
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1 
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/man 
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/info 
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include/g++-v4 
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec 
--enable-nls --without-included-gettext --with-system-zlib 
--disable-checking --disable-werror --enable-secureplt --enable-multilib 
--enable-libmudflap --disable-libssp --enable-cld --disable-libgcj 
--enable-languages=c,c++,treelang,fortran --enable-shared 
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu 
--with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.3.1-r1 
p1.1'
Thread model: posix
gcc version 4.3.1 (Gentoo 4.3.1-r1 p1.1)
>
> Vincent
>
> On Sep 17, 2008, at 10:25 PM, Greg Lindahl wrote:
>
>> On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote:
>>
>>> Also, note that I've had issues with icc
>>> generating really fast but inaccurate code (fp model is not IEEE *by
>>> default*, I am sure _everyone_ knows this and I am stating the obvious
>>> here).
>>
>> All modern, high-performance compilers default that way. It's certainly
>> the case that sometimes it goes more horribly wrong than necessary, but
>> I wouldn't ding icc for this default. Compare results with IEEE mode.
>>
>> -- greg
>>