[Beowulf] Woodcrest Memory bandwidth

Joe Landman landman at scalableinformatics.com
Mon Aug 14 13:26:04 PDT 2006

Jason Holmes wrote:
> Joe Landman wrote:
>> I have it on good authority that with the other chipset (we have a 
>> Blackford here), we should see higher numbers.  Not exceeding the 
>> Opteron 275 though.
> We have both a greencreek based (the one with the snoop filter.. I guess 
> they're calling it the 5000x now) and a blackford based system and I'm 
> not seeing any difference beyond a percent or two of performance either 
> in stream or in real applications.  Maybe we just haven't hit the magic 
> app that greencreek helps yet.

There is some magic incantation to turn on some super-magic feature.  I 
think it is like an "11" setting for the memory system for specific 
workloads.  Since I don't have that one, I can't play with it :(

>> What I can say is that Woodcrest is interesting.  It just may be 
>> overhyped by a "compliant" media.
> I'd say it's a bit overhyped, but it is giving us some good real 
> application numbers compared to the opterons.  On NAMD, Amber, and VASP 
> so far, we've seen 2.66 GHz Woodcrest between 10-20% faster than our 2.4 
> GHz dual-core Opterons while using all 4 processors in a box.  It's not 
> the 50% media hype numbers, but it's definitely an improvement on the 
> Intel side.

I agree its an improvement for Intel.  My question is how much of that 
is due to a larger cache per core on woodcrest?  Memory bandwidth 
doesn't look like its quite where I wanted it to be.  I am seeing some 
anomolous results with GAMESS that I need to try to understand better. 
Some things the 275 is within a percent or so of the 2.66 GHz woodcrest, 
and others, the Woodcrest is like 70% faster.  The latter it is not 
demonstrating with all, or even a majority of workloads.

10%=20% ain't bad, but if it is all due to clock, or all due to cache, 
then thats not as good.  Thats an ephermal lead.  Will change.

While I would like to take the hype at face value, it is after all hype, 
and marketing fluff.  The only important data is where the binary meets 
the silicon ... :) running real apps with real input decks they way 
people really want to run them.  Nothing correlates as well with 
application performance as the application itself.

I want to get as much testing in with the 2.66 before I change up to 
3's.  Once I have that hopefully will be able to get a clearer sense for 
what is clock dependent and what is better architectural changes.


> Thanks,
> -- 
> Jason Holmes


Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list