[Beowulf] bizarre scaling behavior on a Nehalem
bill at cse.ucdavis.edu
Wed Aug 12 11:19:59 PDT 2009
Gus Correa wrote:
> Hi Bill, list
> Bill: This is very interesting indeed. Thanks for sharing!
> Bill's graph seem to show that Shanghai and Barcelona scale
> (almost) linearly with the number of cores, whereas Nehalem stops
> scaling and flattens out at 4 cores.
Right. That's not really surprising since the core i7 has only 4 cores. I
wasn't testing a dual socket nehalem. So on a single socket core i7 that I
tested the hyperthreading provided no additional performance. None to
surprising since hyperthreading is about sharing idle functional units, but
doesn't do much when the cache or memory system is saturated.
> The Nehalem 8 cores and 4 cores curves are virtually indistinguishable,
Yes, but it was 8 threads on 4 cores, vs 4 threads on 4 cores. I'd expect
something less memory intensive and more cpu intensive would show a big
difference. In fact many of the HPC codes I've tried see a benefit.
> and for very large arrays 4 cores is ahead.
> Only for huge arrays (>16M) Nehalem gets ahead
> of Shanghai and Barcelona.
Yes, impressive that a single socket intel has more main memory bandwidth then
a dual socket shanghai.
> Did I interpret the graph right?
> Wasn't this type of scaling problem that plagued
> the Clovertown and Harpertown?
Heh, the mention single socket core i7 has substantially more (2-4x) memory
bandwidth of the previous generation intels.
> Any possibility that kernels, BIOS, etc, are not yet ready for Nehalem?
They look good for me, still trying to find out why I don't see better
performance inside L1 though.
More information about the Beowulf