[Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460
Eric Thibodeau
kyron at neuralbs.com
Wed Sep 17 17:34:41 PDT 2008
Vincent Diepeveen wrote:
> Nah,
>
> I guess he's referring to sometimes it's using single precision
> floating point
> to get something done instead of double precision, and it tends to keep
> sometimes stuff in registers.
>
> That isn't a problem necessarily, but if i remember well floating
> point state
> could get wiped out when switching to SSE2.
>
> Sometimes you lose your FPU registerset in that case.
>
> Main problem is that there is so many dangerous optimizations possible,
> to speedup testsets, because in itself floating point is real slow to
> do at hardware,
> from hardware viewpoint seen.
>
> Yet in general last generations of intel compilers that has improved
> really a lot.
Well, running the same code here is the result discrepancy I got:
FLOPS:
my code has to do: 7,975,847,125,000 (~8Tflops) ...takes 15minutes
on 8*2core Opeteron with 32 Gigs-o-RAM (thank you OpenMP ;)
The running times (ran it a _few_ times...but not the statistical
minimum of 30):
ICC -> runtime == 689.249 ; summed error == 1651.78
GCC -> runtime == 1134.404 ; summed error == 0.883501
Compiler Flags:
icc -xW -openmp -O3 vqOpenMP.c -o vqOpenMP
gcc -lm -fopenmp -O3 -march=native vqOpenMP.c -o vqOpenMP_GCC
No trickery, no smoky mirrors ;) Just a _huge_ kick ASS k-Means
parallelized with OpenMP (thank gawd, otherwise it takes hours to run)
and a rather big database of 1.4 Gigs
... So this is what I meant by floating point errors. Yes, the runtime
was almost halved by ICC (and this is on an *opteron* based system, Tyan
VX50). The running time wasn't what I was actually looking for rather
than precision skew and that's where I fell off my chair.
For the ones itching for a little more specs:
eric at einstein ~ $ icc -V
Intel(R) C Compiler for applications running on Intel(R) 64, Version
10.1 Build 20080602
Copyright (C) 1985-2008 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY
eric at einstein ~ $ gcc -v
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with:
/dev/shm/portage/sys-devel/gcc-4.3.1-r1/work/gcc-4.3.1/configure
--prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.1
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/man
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/info
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include/g++-v4
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec
--enable-nls --without-included-gettext --with-system-zlib
--disable-checking --disable-werror --enable-secureplt --enable-multilib
--enable-libmudflap --disable-libssp --enable-cld --disable-libgcj
--enable-languages=c,c++,treelang,fortran --enable-shared
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
--with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.3.1-r1
p1.1'
Thread model: posix
gcc version 4.3.1 (Gentoo 4.3.1-r1 p1.1)
>
> Vincent
>
> On Sep 17, 2008, at 10:25 PM, Greg Lindahl wrote:
>
>> On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote:
>>
>>> Also, note that I've had issues with icc
>>> generating really fast but inaccurate code (fp model is not IEEE *by
>>> default*, I am sure _everyone_ knows this and I am stating the obvious
>>> here).
>>
>> All modern, high-performance compilers default that way. It's certainly
>> the case that sometimes it goes more horribly wrong than necessary, but
>> I wouldn't ding icc for this default. Compare results with IEEE mode.
>>
>> -- greg
>>
More information about the Beowulf
mailing list