[Beowulf] Strange Opteron 2350 performance: Gaussian-03

Sat Jun 28 09:37:12 PDT 2008

Hello,
Sorry, I don't have the same applications as you.
Did you compile them with gcc? If gcc, then -o3 can do some optimization.
-march=k8 is enough I think.
And you make sure the CPU running at the default frequency. Sometime Powernow is active as default.
And BTW, what's your platform? Linux? Which release? X86_64?
Regards,
Li, Bo
----- Original Message ----- 
From: "Mikhail Kuzminsky" <kus at free.net>
To: "Li, Bo" <libo at buaa.edu.cn>
Cc: <beowulf at beowulf.org>
Sent: Sunday, June 29, 2008 12:23 AM
Subject: Re: [Beowulf] Strange Opteron 2350 performance: Gaussian-03


> In message from "Li, Bo" <libo at buaa.edu.cn> (Sun, 29 Jun 2008 00:07:07 
> +0800):
>>Hello,
>>I am afraid there must be something wrong with your experiment.
>>How did you get the performance? Was your DFT codes running in 
>>parallel? Any optimization involved?
> 
> I was afraid the same, but the results are reproduced twice.
> 
> As I wrote in my message:
> 
> - there were ONE CORE (one CPU for Opteron 246) runs 
> - the optimization was performed for OLD Opteron 246 (because 
> Gaussian, Inc do not propose binaries optimized specially for 
> Barcelona)
> 
> DFT test397 (as any other DFT) is parallelized well, and on Opteron 
> 246 it gives 1.9 times speedup on 2 CPUs. But I didn't run 2-cores 
> parallelized job for Opteron 2350: I was stressed by results obtained 
> for 1 core. 
> 
>>In most of my test, K8L or K10 can beat old opteron at the same 
>>frequency with about 20% improvement.
> 
> Sorry, do you have this on Gaussian-03 and for DFT in particular ? Did 
> you compile it on K10 using target=barcelona (i.e. optimized for 
> barcelona) ?  
> 
> Yours
> Mikhail
> 
>>Regards,
>>Li, Bo
>>----- Original Message ----- 
>>From: "Mikhail Kuzminsky" <kus at free.net>
>>To: <beowulf at beowulf.org>
>>Sent: Saturday, June 28, 2008 11:48 PM
>>Subject: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
>>
>>
>>> I'm runnung a set of quad-core Opteron 2350 benchmarks, in 
>>>particular 
>>> using Gaussian-03 (binary version from Gaussian, Inc, i.e. 
>>>translated 
>>> by more old - than current - pgf77 version, for Opteron target).
>>> 
>>> I compare in particular *one core* of Opteron 2350 w/Opteron 246 
>>> having the same 2 Ghz frequency and the same amount of cache per 
>>>core 
>>> (512K L2 + 0.25*2 MB L3 for Opteron 2350 is just 1 MB L2 for Opteron 
>>> 246). Opteron 246 has even more fast DDR2-667 RAM.
>>> 
>>> The Gaussian-03 performance in some cases is close for both 
>>>Opteron's 
>>> (I remember that compilation didn't know about Barcelona !), but for 
>>> very popular DFT method Opteron 2350 cores looks as slow: one job 
>>> gives 33% more bad (than Opteron 246) performance. 
>>> 
>>> But on standard Gaussian-03 test397.com DFT/B3LYP test: *one* (1) 
>>> Opteron 2350 core run 15667 sec. (both startstop and cpu) vs 8709 
>>>sec. 
>>> on (one) Opteron 246 !! 
>>> 
>>> There is no powersaved daemon, so the frequnecy of Opteron 2350 is 
>>> fixed to 2 Ghz. I reproduced this result twice on Opteron 2350, in 
>>> particular one time using forced good numactl behaviour. I'm 
>>> reproducing it on Opteron 246 again :-) but I have indirect 
>>> confirmation of this timings (based on 2-cpus Opteron 246 parallel 
>>> test).
>>> 
>>> Yes, AFAIK DFT method is cache-friendly, and more slow L3 cache in 
>>> Opteron 2350 may give more bad performance. But in 1.8 times ??
>>> 
>>> Any your comments are welcome.
>>> 
>>> Mikhail Kuzminsky
>>> Computer Assistance to Chemical Research Center
>>> Zelinsky Institute of Organic Chemistry
>>> Moscow
>>> 
>>> 
>>> 
>>> 
>>> 
>>>       
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit 
>>>http://www.beowulf.org/mailman/listinfo/beowulf
>