[Beowulf] performance tweaks and optimum memory configs for a Nehalem

Rahul Nabar rpnabar at gmail.com
Mon Aug 10 09:43:22 PDT 2009

On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahn<hahn at mcmaster.ca> wrote:
>> (a) I am seeing strange scaling behaviours with Nehlem cores. eg A
>> specific DFT (Density Functional Theory) code we use is maxing out
>> performance at 2, 4 cpus instead of 8. i.e. runs on 8 cores are
>> actually slower than 2 and 4 cores (depending on setup)
> this is on the machine which reports 16 cores, right?  I'm guessing
> that the kernel is compiled without numa and/or ht, so enumerates virtual
> cpus first.  that would mean that when otherwise idle, a 2-core
> proc will get virtual cores within the same physical core.  and that your 8c
> test is merely keeping the first socket busy.

No. On both machines. The one reporting 16 cores and the other
reporting 8. i.e. one hyperthreaded and the other not. Both having 8
physical cores.

What is bizarre is I tried using -np 16. THat ought to definitely
utilize all cores, right? I'd have expected the 16 core performance to
be the best. BUt no the performance peaks at a smaller number of


