[Beowulf] bizarre scaling behavior on a Nehalem
Craig.Tierney at noaa.gov
Tue Aug 11 10:40:03 PDT 2009
Rahul Nabar wrote:
> On Mon, Aug 10, 2009 at 12:48 PM, Bruno Coutinho<coutinho at dcc.ufmg.br> wrote:
>> This is often caused by cache competition or memory bandwidth saturation.
>> If it was cache competition, rising from 4 to 6 threads would make it worse.
>> As the code became faster with DDR3-1600 and much slower with Xeon 5400,
>> this code is memory bandwidth bound.
>> Tweaking CPU affinity to avoid thread jumping among cores of the will not
>> help much, as the big bottleneck is memory bandwidth.
>> To this code, CPU affinity will only help in NUMA machines to maintain
>> memory access in local memory.
>> If the machine has enough bandwidth to feed the cores, it will scale.
> Exactly! But I thought this was the big advance with the Nehalem that
> it has removed the CPU<->Cache<->RAM bottleneck. So if the code scaled
> with the AMD Barcelona then it would continue to scale with the
> Nehalem right?
> I'm posting a copy of my scaling plot here if it helps.
> To remove most possible confounding factors this particular Nehlem
> plot is produced with the following settings:
> Hyperthreading OFF
> 24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration
> Even if we explained away the bizzare performance of the 4 node case
> to the Turbo effect what is most confusing is how the 8 core data
> point could be so much slower than the corresponding 8 core point on a
> old AMD Barcelona.
> Something's wrong here that I just do not understand. BTW, any other
> VASP users here? Anybody have any Nehalem experience?
What are you doing to ensure that you have both memory and processor
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Craig Tierney (craig.tierney at noaa.gov)
More information about the Beowulf