[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron275performance]
Mikhail Kuzminsky
kus at free.net
Mon Jul 4 10:48:28 PDT 2005
In message from Vincent Diepeveen <diep at xs4all.nl> (Mon, 04 Jul 2005
17:59:40 +0200):
> ...
> ...
>Of course we take a large buffer. Around 400MB is the working set
>size for
>the hashtable which i use for my chess software (which is reading
>randomly
>a 8-64 bytes from the cache).
>
>Results:
> single cpu A64 : 91 ns (cl2 memory)
> single cpu P4 : 220 ns (cl2 memory, bus overclocked)
> dual opteron : 120 ns
> quad opteron : 133 ns
> dual xeon : 280 ns (800Mhz bus)
> dual xeon : 400 ns (533Mhz bus)
The latencies should depends from processors frequencies (although
RAM part is much higher),
so what was the frequencies for A64/P4/Opteron/Xeon ?
And do I understand you correctly that you have 1/2/4 threads which
perform "random" read of some bytes from main memory ?
>
>So obviously things that do not fit in L2 cache, the opteron runs
>away with
>it. Only if the executable is optimized in question by the intel c++
>compiler it will have done stuff to run it faster at intel processors
>than >at opteron,
>then results do not look too bad for P4.
If the results above are for "bad" (bad optimizing) compiler -
in some sense it's the problem of compiler :-) Yes, old binary
software will work slow. But many, many HPC applications may be
compiled
from source.
BTW, more good results are for icc++ only - do you know
something about PGI and PathScale compilers ?
> Yet that's a matter of
>optimizing
>it for opteron better, which most software dudes do NOT do, as intel
>delivers good support and AMD historically didn't deliver *any* kind
>of
>support (they are improving now, but even then their math libraries
>are so
>pathetic compared to the ease of the intel libraries that i can
>imagine at
>least *that* part of
>the problems).
acml 2.1 gives me a set of good results for Opteron in comparison
w/MKL
Yours
Mikhail
More information about the Beowulf
mailing list