[Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460
Vincent Diepeveen
diep at xs4all.nl
Wed Sep 17 17:37:07 PDT 2008
How does all this change when you use a PGO optimized executable on
both sides?
Vincent
On Sep 18, 2008, at 2:34 AM, Eric Thibodeau wrote:
> Vincent Diepeveen wrote:
>> Nah,
>>
>> I guess he's referring to sometimes it's using single precision
>> floating point
>> to get something done instead of double precision, and it tends to
>> keep
>> sometimes stuff in registers.
>>
>> That isn't a problem necessarily, but if i remember well floating
>> point state
>> could get wiped out when switching to SSE2.
>>
>> Sometimes you lose your FPU registerset in that case.
>>
>> Main problem is that there is so many dangerous optimizations
>> possible,
>> to speedup testsets, because in itself floating point is real slow
>> to do at hardware,
>> from hardware viewpoint seen.
>>
>> Yet in general last generations of intel compilers that has
>> improved really a lot.
> Well, running the same code here is the result discrepancy I got:
> FLOPS:
> my code has to do: 7,975,847,125,000 (~8Tflops) ...takes
> 15minutes on 8*2core Opeteron with 32 Gigs-o-RAM (thank you OpenMP ;)
>
> The running times (ran it a _few_ times...but not the statistical
> minimum of 30):
> ICC -> runtime == 689.249 ; summed error == 1651.78
> GCC -> runtime == 1134.404 ; summed error == 0.883501
>
> Compiler Flags:
> icc -xW -openmp -O3 vqOpenMP.c -o vqOpenMP
> gcc -lm -fopenmp -O3 -march=native vqOpenMP.c -o vqOpenMP_GCC
>
> No trickery, no smoky mirrors ;) Just a _huge_ kick ASS k-Means
> parallelized with OpenMP (thank gawd, otherwise it takes hours to
> run) and a rather big database of 1.4 Gigs
>
> ... So this is what I meant by floating point errors. Yes, the
> runtime was almost halved by ICC (and this is on an *opteron* based
> system, Tyan VX50). The running time wasn't what I was actually
> looking for rather than precision skew and that's where I fell off
> my chair.
>
> For the ones itching for a little more specs:
>
> eric at einstein ~ $ icc -V
> Intel(R) C Compiler for applications running on Intel(R) 64,
> Version 10.1 Build 20080602
> Copyright (C) 1985-2008 Intel Corporation. All rights reserved.
> FOR NON-COMMERCIAL USE ONLY
>
> eric at einstein ~ $ gcc -v
> Using built-in specs.
> Target: x86_64-pc-linux-gnu
> Configured with: /dev/shm/portage/sys-devel/gcc-4.3.1-r1/work/
> gcc-4.3.1/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/
> gcc-bin/4.3.1 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/
> include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1 --
> mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/man --infodir=/
> usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/info --with-gxx-
> include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include/g++-v4 --
> host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-
> altivec --enable-nls --without-included-gettext --with-system-zlib
> --disable-checking --disable-werror --enable-secureplt --enable-
> multilib --enable-libmudflap --disable-libssp --enable-cld --
> disable-libgcj --enable-languages=c,c++,treelang,fortran --enable-
> shared --enable-threads=posix --enable-__cxa_atexit --enable-
> clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-
> pkgversion='Gentoo 4.3.1-r1 p1.1'
> Thread model: posix
> gcc version 4.3.1 (Gentoo 4.3.1-r1 p1.1)
>>
>> Vincent
>>
>> On Sep 17, 2008, at 10:25 PM, Greg Lindahl wrote:
>>
>>> On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote:
>>>
>>>> Also, note that I've had issues with icc
>>>> generating really fast but inaccurate code (fp model is not IEEE
>>>> *by
>>>> default*, I am sure _everyone_ knows this and I am stating the
>>>> obvious
>>>> here).
>>>
>>> All modern, high-performance compilers default that way. It's
>>> certainly
>>> the case that sometimes it goes more horribly wrong than
>>> necessary, but
>>> I wouldn't ding icc for this default. Compare results with IEEE
>>> mode.
>>>
>>> -- greg
>>>
>
>
More information about the Beowulf
mailing list