[Beowulf] bizarre scaling behavior on a Nehalem

Gus Correa gus at ldeo.columbia.edu
Wed Aug 12 11:09:04 PDT 2009


Hi Bill, list

Bill:  This is very interesting indeed.  Thanks for sharing!

Bill's graph seem to show that Shanghai and Barcelona scale
(almost) linearly with the number of cores, whereas Nehalem stops
scaling and flattens out at 4 cores.
The Nehalem 8 cores and 4 cores curves are virtually indistinguishable,
and for very large arrays 4 cores is ahead.
Only for huge arrays (>16M) Nehalem gets ahead
of Shanghai and Barcelona.

Did I interpret the graph right?
Wasn't this type of scaling problem that plagued
the Clovertown and Harpertown?
Any possibility that kernels, BIOS, etc, are not yet ready for Nehalem?

Thanks,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Bill Broadley wrote:
> I've been working on a pthread memory benchmark that is loosely modeled on
> McCalpin's stream.  It's been quite a challenge to remove all the noise/lost
> performance from the benchmark to get close to performance I expected.  Some
> of the obstacles:
> * For the compilers that tend to be better at stream (open64 and pathscale),
>   you lose the performance if you just replace double a[],b[],c[] with
>   double *a,*b,*c. Patch[1] available.  I don't have a work around for
>   this, suggestions welcome.  Is it really necessary for dynamic arrays
>   to be substantially slower than static?
> * You have to be very careful with pointer alignment both with cache lines,
>   and each other
> * cpu_affinity (by CPU id)
> * numa (by socket id)
> 
> The results are relatively smooth graphs, here's an example, it's uselessly
> busy until you toggle off a few graphs (by clicking on the key):
> 
> http://cse.ucdavis.edu/bill/pstream.svg
> 
> The biggest puzzle I have now is what the previous generation intel quads, the
> current generation AMD quads, and numerous other CPUs show a big benefit in
> L1, while the nehalem shows no benefit.
> 
> [1] http://cse.ucdavis.edu/bill/stream-malloc.patch
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list