[Beowulf] bizarre scaling behavior on a Nehalem

Rahul Nabar rpnabar at gmail.com
Mon Aug 10 09:41:06 PDT 2009

A while ago Tiago Marques had provided some benchmarking info in a
thread ( http://www.beowulf.org/archive/2009-May/025739.html ) and
some recent tests that I've been doing made me interested in this
snippet again:

>One of the codes, VASP, is very bandwidth limited and loves to run in a
>number of cores multiple of 3. The 5400s are also very bandwith - memory and
>FSB - limited which causes that they sometimes don't scale well above 6
>cores. They are very fast per core, as someone mentioned, when compared to
>AMD cores.

>These are the times I get from a benchmark I usually run in VASP:
>VASP on Core i7:
>        - 1 core = 162.453s, 162.778s (no HT)
>       - 2 cores = 100s,102s (no HT)
>        - 3 cores = 77.835s, 78.195s (no HT)
>        - 4 cores = 87.63s, 87.322s (no HT)
>        - 6 cores = *76.56s, 76.4s*
>        - 6 cores DDR3-1600 CAS9 - 69.654s, 68.816s, 67.7s
>HT doesn't add much but DDR3-1600 does. Still, ~78s is very fast with a
>quad-core because our dual 5400s can only do *91s* at best, even using
>tweaks like CPU affinity, which brings it down from 95s, by distributing
>only 3 threads per socket and not 4/2 or having 4 of them constantly jumping
>from socket to socket.

Apparently it shows that the Nehalems for VASP scale well only to 3
cores? Putting 4 cores on the job actually causes the runtime to
increase? This seems pretty bizzare to me at first sight but this
seems close to what I am getting as well. Any other people seen
similar scaling? (I am trying the cpu affinity flags now to see if
that makes a difference)

How would you explain this? In the past I've seen the codes scale well
to core numbers higher than this.


More information about the Beowulf mailing list