[Beowulf] Strange Opteron 2350 performance: Gaussian-03
Li, Bo
libo at buaa.edu.cn
Sat Jun 28 09:37:12 PDT 2008
Hello,
Sorry, I don't have the same applications as you.
Did you compile them with gcc? If gcc, then -o3 can do some optimization.
-march=k8 is enough I think.
And you make sure the CPU running at the default frequency. Sometime Powernow is active as default.
And BTW, what's your platform? Linux? Which release? X86_64?
Regards,
Li, Bo
----- Original Message -----
From: "Mikhail Kuzminsky" <kus at free.net>
To: "Li, Bo" <libo at buaa.edu.cn>
Cc: <beowulf at beowulf.org>
Sent: Sunday, June 29, 2008 12:23 AM
Subject: Re: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
> In message from "Li, Bo" <libo at buaa.edu.cn> (Sun, 29 Jun 2008 00:07:07
> +0800):
>>Hello,
>>I am afraid there must be something wrong with your experiment.
>>How did you get the performance? Was your DFT codes running in
>>parallel? Any optimization involved?
>
> I was afraid the same, but the results are reproduced twice.
>
> As I wrote in my message:
>
> - there were ONE CORE (one CPU for Opteron 246) runs
> - the optimization was performed for OLD Opteron 246 (because
> Gaussian, Inc do not propose binaries optimized specially for
> Barcelona)
>
> DFT test397 (as any other DFT) is parallelized well, and on Opteron
> 246 it gives 1.9 times speedup on 2 CPUs. But I didn't run 2-cores
> parallelized job for Opteron 2350: I was stressed by results obtained
> for 1 core.
>
>>In most of my test, K8L or K10 can beat old opteron at the same
>>frequency with about 20% improvement.
>
> Sorry, do you have this on Gaussian-03 and for DFT in particular ? Did
> you compile it on K10 using target=barcelona (i.e. optimized for
> barcelona) ?
>
> Yours
> Mikhail
>
>>Regards,
>>Li, Bo
>>----- Original Message -----
>>From: "Mikhail Kuzminsky" <kus at free.net>
>>To: <beowulf at beowulf.org>
>>Sent: Saturday, June 28, 2008 11:48 PM
>>Subject: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
>>
>>
>>> I'm runnung a set of quad-core Opteron 2350 benchmarks, in
>>>particular
>>> using Gaussian-03 (binary version from Gaussian, Inc, i.e.
>>>translated
>>> by more old - than current - pgf77 version, for Opteron target).
>>>
>>> I compare in particular *one core* of Opteron 2350 w/Opteron 246
>>> having the same 2 Ghz frequency and the same amount of cache per
>>>core
>>> (512K L2 + 0.25*2 MB L3 for Opteron 2350 is just 1 MB L2 for Opteron
>>> 246). Opteron 246 has even more fast DDR2-667 RAM.
>>>
>>> The Gaussian-03 performance in some cases is close for both
>>>Opteron's
>>> (I remember that compilation didn't know about Barcelona !), but for
>>> very popular DFT method Opteron 2350 cores looks as slow: one job
>>> gives 33% more bad (than Opteron 246) performance.
>>>
>>> But on standard Gaussian-03 test397.com DFT/B3LYP test: *one* (1)
>>> Opteron 2350 core run 15667 sec. (both startstop and cpu) vs 8709
>>>sec.
>>> on (one) Opteron 246 !!
>>>
>>> There is no powersaved daemon, so the frequnecy of Opteron 2350 is
>>> fixed to 2 Ghz. I reproduced this result twice on Opteron 2350, in
>>> particular one time using forced good numactl behaviour. I'm
>>> reproducing it on Opteron 246 again :-) but I have indirect
>>> confirmation of this timings (based on 2-cpus Opteron 246 parallel
>>> test).
>>>
>>> Yes, AFAIK DFT method is cache-friendly, and more slow L3 cache in
>>> Opteron 2350 may give more bad performance. But in 1.8 times ??
>>>
>>> Any your comments are welcome.
>>>
>>> Mikhail Kuzminsky
>>> Computer Assistance to Chemical Research Center
>>> Zelinsky Institute of Organic Chemistry
>>> Moscow
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit
>>>http://www.beowulf.org/mailman/listinfo/beowulf
>
More information about the Beowulf
mailing list