[Beowulf] bizarre scaling behavior on a Nehalem
    Bill Broadley 
    bill at cse.ucdavis.edu
       
    Wed Aug 12 11:19:59 PDT 2009
    
    
  
Gus Correa wrote:
> Hi Bill, list
> 
> Bill:  This is very interesting indeed.  Thanks for sharing!
> 
> Bill's graph seem to show that Shanghai and Barcelona scale
> (almost) linearly with the number of cores, whereas Nehalem stops
> scaling and flattens out at 4 cores.
Right.  That's not really surprising since the core i7 has only 4 cores.  I
wasn't testing a dual socket nehalem.  So on a single socket core i7 that I
tested the hyperthreading provided no additional performance.  None to
surprising since hyperthreading is about sharing idle functional units, but
doesn't do much when the cache or memory system is saturated.
> The Nehalem 8 cores and 4 cores curves are virtually indistinguishable,
Yes, but it was 8 threads on 4 cores, vs 4 threads on 4 cores.  I'd expect
something less memory intensive and more cpu intensive would show a big
difference.  In fact many of the HPC codes I've tried see a benefit.
> and for very large arrays 4 cores is ahead.
> Only for huge arrays (>16M) Nehalem gets ahead
> of Shanghai and Barcelona.
Yes, impressive that a single socket intel has more main memory bandwidth then
a dual socket shanghai.
> Did I interpret the graph right?
> Wasn't this type of scaling problem that plagued
> the Clovertown and Harpertown?
Heh, the mention single socket core i7 has substantially more (2-4x) memory
bandwidth of the previous generation intels.
> Any possibility that kernels, BIOS, etc, are not yet ready for Nehalem?
They look good for me, still trying to find out why I don't see better
performance inside L1 though.
    
    
More information about the Beowulf
mailing list