[Beowulf] bizarre scaling behavior on a Nehalem
Mikhail Kuzminsky
kus at free.net
Wed Aug 12 11:50:16 PDT 2009
In message from Gus Correa <gus at ldeo.columbia.edu> (Wed, 12 Aug 2009
14:09:04 -0400):
>Hi Bill, list
>
>Bill: This is very interesting indeed. Thanks for sharing!
>
>Bill's graph seem to show that Shanghai and Barcelona scale
>(almost) linearly with the number of cores, whereas Nehalem stops
>scaling and flattens out at 4 cores.
>The Nehalem 8 cores and 4 cores curves are virtually
>indistinguishable,
>and for very large arrays 4 cores is ahead.
>Only for huge arrays (>16M) Nehalem gets ahead
>of Shanghai and Barcelona.
IMHO, if arrays are not "huge", they will fit in cache L3 (8MB !).
Or on X axe are presented Mwords ?
Mikhail
>
>Did I interpret the graph right?
>Wasn't this type of scaling problem that plagued
>the Clovertown and Harpertown?
>Any possibility that kernels, BIOS, etc, are not yet ready for
>Nehalem?
>
>Thanks,
>Gus Correa
>---------------------------------------------------------------------
>Gustavo Correa
>Lamont-Doherty Earth Observatory - Columbia University
>Palisades, NY, 10964-8000 - USA
>---------------------------------------------------------------------
>
>Bill Broadley wrote:
>> I've been working on a pthread memory benchmark that is loosely
>>modeled on
>> McCalpin's stream. It's been quite a challenge to remove all the
>>noise/lost
>> performance from the benchmark to get close to performance I
>>expected. Some
>> of the obstacles:
>> * For the compilers that tend to be better at stream (open64 and
>>pathscale),
>> you lose the performance if you just replace double a[],b[],c[]
>>with
>> double *a,*b,*c. Patch[1] available. I don't have a work around
>>for
>> this, suggestions welcome. Is it really necessary for dynamic
>>arrays
>> to be substantially slower than static?
>> * You have to be very careful with pointer alignment both with cache
>>lines,
>> and each other
>> * cpu_affinity (by CPU id)
>> * numa (by socket id)
>>
>> The results are relatively smooth graphs, here's an example, it's
>>uselessly
>> busy until you toggle off a few graphs (by clicking on the key):
>>
>> http://cse.ucdavis.edu/bill/pstream.svg
>>
>> The biggest puzzle I have now is what the previous generation intel
>>quads, the
>> current generation AMD quads, and numerous other CPUs show a big
>>benefit in
>> L1, while the nehalem shows no benefit.
>>
>> [1] http://cse.ucdavis.edu/bill/stream-malloc.patch
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>Computing
>> To change your subscription (digest mode or unsubscribe) visit
>>http://www.beowulf.org/mailman/listinfo/beowulf
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>Computing
>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>--
>üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
>É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
>ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
>
More information about the Beowulf
mailing list