[Beowulf] Opteron performance
Joe Landman
landman at scalableinformatics.com
Sat Nov 27 16:30:41 PST 2004
Kozin, I (Igor) wrote:
>Regarding Opteron performance and performance in general.
>
>To the Opteron users out there, would you mind sharing your
>experience to date regarding Opteron performance?
>
>It would be particularly interesting to hear
>- what the problems are (if any),
>
>
NUMA support is maturing. Processor/memory affinity, memory layout, and
other issues impact benchmarks dramatically. You want your thread
running on a CPU which resides adjacent to the memory holding the
application. You want to make sure you set up your memory system
properly, I have seen too many benchmarks run on systems with poorly
configured memory.
>- kernel comparisons preferably in quantifiable terms (e.g.
> RH vs Suse vs GNU 2.6, NUMA and O(1) scheduler support etc),
>
>
Don't use a 2.4 kernel if you can avoid it. The RH kernels (even with
backports) are ancient. Lots of good things are missing (IMO). SuSE
ships/supports 2.6, as do some others. 2.6 knows more about NUMA.
>- real benefit of hypertransport etc.
>I'd suggest to leave compiler comparisons out.
>
>What bothers me primarily is that you have to run a benchmark
>many more times than usual to get the best performance on an Opteron.
>I've heard other people also mentioning "strange" Opteron behaviour.
>I'd actually suggest to report a standard deviation error
>with every performance figure. I've seen as large as 50% deviation
>in performance from run to run.
>
>
Reporting an SD is a wise thing generally (it is a measurement, and you
expect some width to it). There are far too many "benchmarks" out there
where people run their tests once, get a number, and have no sense of
how repeatable their measurement is. They are happy to draw conclusions
from it though.
>Here is one sequence of serial performance for example
>(time in seconds, Opteron244,Suse with 2.4.19 kernel):
>3254, 2579, 3258, 2582, 3258, 2658. Clearly if I state that the
>Opteron can do the job in 2579 sec it makes little practical sense
>because in 50% cases it will do 26% slower.
>I tested the same executable with GNU 2.6.8 kernel and
>was pleased to observe much smaller deviation
>(about twice as small but then I could not get quite the same
>best performance I had before: 2632 vs 2579 previously).
>
>
If you look carefully at the numbers, they are not uniformly
distributed. You appear to have a bimodal distribution that seems to be
consistent with processor affinity problems on a dual processor. I
would guess that the higher numbers represent the cases when the memory
for CPMD was on processor 0 and the the code itself ran on processor 1.
This means an HT hop to get the data. You can force affinity using some
of the affinity scheduler tools (Robert Love's
http://tech9.net/rml/schedutils/ tools)
>I must admit that the application is memory bandwidth hungry
>and generally the more the application is memory intensive
>the higher the deviation. In the case above some of the memory
>had to be used off the second cpu.
>
>
The ancient kernels (ala RH 2.4 series) all demonstrated this sort of
problem for me as well. The more modern kernels do much better.
>Also in a parallel application the kernel issues are more likely
>to be blurred out because of the averaging across many cpus.
>It is reasonable to imagine that had Opteron a better kernel
>the performance could have been better too.
>
>
Use 2.6. You will not be sorry.
>The latest disappointment was to observe CPMD performance
>on a quad Opteron 848 (2.2 GHz) vs a dual Opteron 246 (2 GHz):
>(CPMD 3.9.1, wat32 benchmark, run 1/ run 2)
>serial performance: 1963s/3395s vs 2204s/3466s - alright
> Opteron 848 is quicker.
>parallel: 223s/378s (4x4) vs 218s/360s (8x2) - Opteron 246 is
> quicker.
>How much should I expect from HT anyway? If it is not again
>about proper kernel support then I'd be better off having
>dual Opterons if I need to run CPMD.
>
>
There is more opportunity for contention in a quad than a dual if the
scheduler does not know about NUMA, or have processor affinity in mind
for scheduling.
Joe
>
>Kind regards,
>Igor
>
>I. Kozin at dl.ac.uk
>CCLRC Daresbury Laboratory
>tel: 01925 603308
>http://www.cse.clrc.ac.uk/disco
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 612 4615
More information about the Beowulf
mailing list