[Beowulf] Opteron performance

Kozin, I (Igor) i.kozin at dl.ac.uk
Fri Nov 26 03:29:29 PST 2004


Regarding Opteron performance and performance in general.

To the Opteron users out there, would you mind sharing your
experience to date regarding Opteron performance?

It would be particularly interesting to hear 
- what the problems are (if any), 
- kernel comparisons preferably in quantifiable terms (e.g. 
  RH vs Suse vs GNU 2.6, NUMA and O(1) scheduler support etc),
- real benefit of hypertransport etc.
I'd suggest to leave compiler comparisons out.

What bothers me primarily is that you have to run a benchmark 
many more times than usual to get the best performance on an Opteron.
I've heard other people also mentioning "strange" Opteron behaviour.
I'd actually suggest to report a standard deviation error
with every performance figure. I've seen as large as 50% deviation 
in performance from run to run. 

Here is one sequence of serial performance for example 
(time in seconds, Opteron244,Suse with 2.4.19 kernel): 
3254, 2579, 3258, 2582, 3258, 2658. Clearly if I state that the 
Opteron can do the job in 2579 sec it makes little practical sense 
because in 50% cases it will do 26% slower. 
I tested the same executable with GNU 2.6.8 kernel and 
was pleased to observe much smaller deviation
(about twice as small but then I could not get quite the same
best performance I had before: 2632 vs 2579 previously).

I must admit that the application is memory bandwidth hungry
and generally the more the application is memory intensive
the higher the deviation. In the case above some of the memory
had to be used off the second cpu. 
Also in a parallel application the kernel issues are more likely 
to be blurred out because of the averaging across many cpus.
It is reasonable to imagine that had Opteron a better kernel 
the performance could have been better too.

The latest disappointment was to observe CPMD performance
on a quad Opteron 848 (2.2 GHz) vs a dual Opteron 246 (2 GHz):
(CPMD 3.9.1, wat32 benchmark, run 1/ run 2)
serial performance: 1963s/3395s  vs 2204s/3466s - alright 
   Opteron 848 is quicker.
parallel: 223s/378s (4x4) vs 218s/360s (8x2) - Opteron 246 is
   quicker.
How much should I expect from HT anyway? If it is not again
about proper kernel support then I'd be better off having
dual Opterons if I need to run CPMD.


Kind regards,
Igor

I. Kozin   at dl.ac.uk
CCLRC Daresbury Laboratory
tel: 01925 603308
http://www.cse.clrc.ac.uk/disco




More information about the Beowulf mailing list