[Beowulf] recommendations for cluster upgrades

Greg Keller Greg at keller.net
Tue May 12 17:05:41 PDT 2009


> I'm currently shopping around for a cluster-expansion and was shopping
> for options. Anybody out here who's bought new hardware in the  recent
> past? Any suggestions? Any horror stories?
Nehalem is a huge step forward for Memory Bandwidth hogs.  We have one  
code that is extremely memory bandwidth sensitive that is 400% faster  
than on our previous "Harperton" quad core solution from intel.  This  
is a cumulative effect of having enough memory BW to use all the cores  
and better Memory and Cache performance.  It's a must see if you  
suspect memory bandwidth is important, and until the next Gen of AMD  
is available with 4 channels per socket it's going to win a lot of  
business from me.

> For some reason the top500 sublists seem skewed to prefer the Intel
> Xeons. Why so few Opterons or any other AMD hardware? Just curious if
> this is driven by technological inferiority of only a marketing
> effect. My vendor seems to be trying to steer me towards an Intel
> Nehalem or Clovertown for whatever reasons good or bad.
AMD Barcelona was the first 4 flops per cycle processor from AMD, and  
it hit the street with some problems right when the list was coming  
out in end of 2007.  I expect those are great processors now (and a  
more technically challenging design than intel's in some regards), but  
may have skewed the top500 list because Intel's 4 flops per cycle  
chips (Woodcrest/Cloverton) were out a year earlier.

The essential difference... Intel packages multiple "dual core"  
processor packages onto a single die to get their "Quad Core"  
Cloverton.  They pairs have some shared cache but not all 4.  AMD  
integrated the Cache on all 4 cores, and did some neat tricks with the  
Floating point units, which had the side effect of leaving less cache  
per core and a longer time to market.

Hope this helps.... if you really want to stretch your upgrade $$ you  
may be able to swap the Processor in your existing 1435's with the  
latest chip which sounds good on paper at least.  I'd love to hear if  
anyone actually does that type of upgrade on "real" production clusters.

Full Disclosure: I am a former Dell HPCC architect and a Dell customer.


