[Beowulf] bizarre scaling behavior on a Nehalem
Mikhail Kuzminsky
kus at free.net
Wed Aug 12 08:14:25 PDT 2009
In message from Craig Tierney <Craig.Tierney at noaa.gov> (Tue, 11 Aug
2009 11:40:03 -0600):
>Rahul Nabar wrote:
>> On Mon, Aug 10, 2009 at 12:48 PM, Bruno
>>Coutinho<coutinho at dcc.ufmg.br> wrote:
>>> This is often caused by cache competition or memory bandwidth
>>>saturation.
>>> If it was cache competition, rising from 4 to 6 threads would make
>>>it worse.
>>> As the code became faster with DDR3-1600 and much slower with Xeon
>>>5400,
>>> this code is memory bandwidth bound.
>>> Tweaking CPU affinity to avoid thread jumping among cores of the
>>>will not
>>> help much, as the big bottleneck is memory bandwidth.
>>> To this code, CPU affinity will only help in NUMA machines to
>>>maintain
>>> memory access in local memory.
>>>
>>>
>>> If the machine has enough bandwidth to feed the cores, it will
>>>scale.
>>
>> Exactly! But I thought this was the big advance with the Nehalem
>>that
>> it has removed the CPU<->Cache<->RAM bottleneck. So if the code
>>scaled
>> with the AMD Barcelona then it would continue to scale with the
>> Nehalem right?
>>
>> I'm posting a copy of my scaling plot here if it helps.
>>
>> http://dl.getdropbox.com/u/118481/nehalem_scaling.jpg
>>
>> To remove most possible confounding factors this particular Nehlem
>> plot is produced with the following settings:
>>
>> Hyperthreading OFF
>> 24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration
>> X5550
>>
>> Even if we explained away the bizzare performance of the 4 node case
>> to the Turbo effect what is most confusing is how the 8 core data
>> point could be so much slower than the corresponding 8 core point on
>>a
>> old AMD Barcelona.
>>
>> Something's wrong here that I just do not understand. BTW, any other
>> VASP users here? Anybody have any Nehalem experience?
>>
>
>Rahul,
>What are you doing to ensure that you have both memory and processor
>affinity enabled?
>Craig
As I mentioned here in "numactl&SuSE11.1' thread, on some kernels
there is wrong behaviour for Nehalem (bad /sys/devices/system/node
directory content). This bug is presented, in particular, in default
OpenSuSE 11 kernels (2.6.27.7-9 and 2.6.29-6), and (as it was writted
in the corresponding thread discussion) in FC11 2.6.29 kernel.
I found that in such situation disabling of NUMA in BIOS gives only
increase of STREAM throughput. Therefore I think this (Rahul) problem
is not due to BIOS settings. Unfortunately I've no data about VASP
itself.
It's interesting, do somebody have "normally working" w/Nehalem - in
the sense of NUMA - kernels ? AFAIK more old 2.6 kernels (from SuSE
10.3) works OK, but I didn't check. May be error in NUMA support is
the reason of Rahul problem ?
Mikhail
>
>
>> --
>> Rahul
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>Computing
>> To change your subscription (digest mode or unsubscribe) visit
>>http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
>--
>Craig Tierney (craig.tierney at noaa.gov)
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>Computing
>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>--
>üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
>É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
>ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
>
More information about the Beowulf
mailing list