[Beowulf] bizarre scaling behavior on a Nehalem

Mon Aug 10 15:28:59 PDT 2009

On Mon, Aug 10, 2009 at 12:48 PM, Bruno Coutinho<coutinho at dcc.ufmg.br> wrote:
> This is often caused by cache competition or memory bandwidth saturation.
> If it was cache competition, rising from 4 to 6 threads would make it worse.
> As the code became faster with DDR3-1600 and much slower with Xeon 5400,
> this code is memory bandwidth bound.
> Tweaking CPU affinity to avoid thread jumping among cores of the will not
> help much, as the big bottleneck is memory bandwidth.
> To this code, CPU affinity will only help in NUMA machines to maintain
> memory access in local memory.
>
>
> If the machine has enough bandwidth to feed the cores, it will scale.

Exactly! But I thought this was the big advance with the Nehalem that
it has removed the CPU<->Cache<->RAM bottleneck. So if the code scaled
with the AMD Barcelona then it would continue to scale with the
Nehalem right?

I'm posting a copy of my scaling plot here if it helps.

http://dl.getdropbox.com/u/118481/nehalem_scaling.jpg

To remove most possible confounding factors this particular Nehlem
plot is produced with the following settings:

Hyperthreading OFF
24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration
X5550

Even if we explained away the bizzare performance of the 4 node case
to the Turbo effect what is most confusing is how the 8 core data
point could be so much slower than the corresponding 8 core point on a
old AMD Barcelona.

Something's wrong here that I just do not understand. BTW, any other
VASP users here? Anybody have any Nehalem experience?

--
Rahul