[Beowulf] Woodcrest Memory bandwidth
landman at scalableinformatics.com
Mon Aug 14 13:26:04 PDT 2006
Jason Holmes wrote:
> Joe Landman wrote:
>> I have it on good authority that with the other chipset (we have a
>> Blackford here), we should see higher numbers. Not exceeding the
>> Opteron 275 though.
> We have both a greencreek based (the one with the snoop filter.. I guess
> they're calling it the 5000x now) and a blackford based system and I'm
> not seeing any difference beyond a percent or two of performance either
> in stream or in real applications. Maybe we just haven't hit the magic
> app that greencreek helps yet.
There is some magic incantation to turn on some super-magic feature. I
think it is like an "11" setting for the memory system for specific
workloads. Since I don't have that one, I can't play with it :(
>> What I can say is that Woodcrest is interesting. It just may be
>> overhyped by a "compliant" media.
> I'd say it's a bit overhyped, but it is giving us some good real
> application numbers compared to the opterons. On NAMD, Amber, and VASP
> so far, we've seen 2.66 GHz Woodcrest between 10-20% faster than our 2.4
> GHz dual-core Opterons while using all 4 processors in a box. It's not
> the 50% media hype numbers, but it's definitely an improvement on the
> Intel side.
I agree its an improvement for Intel. My question is how much of that
is due to a larger cache per core on woodcrest? Memory bandwidth
doesn't look like its quite where I wanted it to be. I am seeing some
anomolous results with GAMESS that I need to try to understand better.
Some things the 275 is within a percent or so of the 2.66 GHz woodcrest,
and others, the Woodcrest is like 70% faster. The latter it is not
demonstrating with all, or even a majority of workloads.
10%=20% ain't bad, but if it is all due to clock, or all due to cache,
then thats not as good. Thats an ephermal lead. Will change.
While I would like to take the hype at face value, it is after all hype,
and marketing fluff. The only important data is where the binary meets
the silicon ... :) running real apps with real input decks they way
people really want to run them. Nothing correlates as well with
application performance as the application itself.
I want to get as much testing in with the 2.66 before I change up to
3's. Once I have that hopefully will be able to get a clearer sense for
what is clock dependent and what is better architectural changes.
> Jason Holmes
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf