[Beowulf] Strange Opteron 2350 performance: Gaussian-03
Mikhail Kuzminsky
kus at free.net
Sat Jun 28 09:23:42 PDT 2008
In message from "Li, Bo" <libo at buaa.edu.cn> (Sun, 29 Jun 2008 00:07:07
+0800):
>Hello,
>I am afraid there must be something wrong with your experiment.
>How did you get the performance? Was your DFT codes running in
>parallel? Any optimization involved?
I was afraid the same, but the results are reproduced twice.
As I wrote in my message:
- there were ONE CORE (one CPU for Opteron 246) runs
- the optimization was performed for OLD Opteron 246 (because
Gaussian, Inc do not propose binaries optimized specially for
Barcelona)
DFT test397 (as any other DFT) is parallelized well, and on Opteron
246 it gives 1.9 times speedup on 2 CPUs. But I didn't run 2-cores
parallelized job for Opteron 2350: I was stressed by results obtained
for 1 core.
>In most of my test, K8L or K10 can beat old opteron at the same
>frequency with about 20% improvement.
Sorry, do you have this on Gaussian-03 and for DFT in particular ? Did
you compile it on K10 using target=barcelona (i.e. optimized for
barcelona) ?
Yours
Mikhail
>Regards,
>Li, Bo
>----- Original Message -----
>From: "Mikhail Kuzminsky" <kus at free.net>
>To: <beowulf at beowulf.org>
>Sent: Saturday, June 28, 2008 11:48 PM
>Subject: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
>
>
>> I'm runnung a set of quad-core Opteron 2350 benchmarks, in
>>particular
>> using Gaussian-03 (binary version from Gaussian, Inc, i.e.
>>translated
>> by more old - than current - pgf77 version, for Opteron target).
>>
>> I compare in particular *one core* of Opteron 2350 w/Opteron 246
>> having the same 2 Ghz frequency and the same amount of cache per
>>core
>> (512K L2 + 0.25*2 MB L3 for Opteron 2350 is just 1 MB L2 for Opteron
>> 246). Opteron 246 has even more fast DDR2-667 RAM.
>>
>> The Gaussian-03 performance in some cases is close for both
>>Opteron's
>> (I remember that compilation didn't know about Barcelona !), but for
>> very popular DFT method Opteron 2350 cores looks as slow: one job
>> gives 33% more bad (than Opteron 246) performance.
>>
>> But on standard Gaussian-03 test397.com DFT/B3LYP test: *one* (1)
>> Opteron 2350 core run 15667 sec. (both startstop and cpu) vs 8709
>>sec.
>> on (one) Opteron 246 !!
>>
>> There is no powersaved daemon, so the frequnecy of Opteron 2350 is
>> fixed to 2 Ghz. I reproduced this result twice on Opteron 2350, in
>> particular one time using forced good numactl behaviour. I'm
>> reproducing it on Opteron 246 again :-) but I have indirect
>> confirmation of this timings (based on 2-cpus Opteron 246 parallel
>> test).
>>
>> Yes, AFAIK DFT method is cache-friendly, and more slow L3 cache in
>> Opteron 2350 may give more bad performance. But in 1.8 times ??
>>
>> Any your comments are welcome.
>>
>> Mikhail Kuzminsky
>> Computer Assistance to Chemical Research Center
>> Zelinsky Institute of Organic Chemistry
>> Moscow
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
>>http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list