[Beowulf] performance tweaks and optimum memory configs for a Nehalem

Gus Correa gus at ldeo.columbia.edu
Fri Aug 7 17:58:39 PDT 2009

Hi Rahul, list

In case you haven't read it, this Nehalem memory guide from Dell
has good information and the memory configuration details:


A researcher here bought a Nehalem workstation (not a cluster)
with 24GB RAM also.  We followed the article recommendation,
which was also what the vendor suggested.
Maybe 24GB is more than needed, but presumably avoids the
performance penalty that would hit a 16GB configuration.
Since the computer will mostly run Matlab jobs,
and Matlab has no bounds when it comes to memory,
it may not have been a waste anyway.

Some people are reporting good results when using the
Nehalem hypethreading feature (activated on the BIOS).
When the code permits, this virtually doubles the number
of cores on Nehalems.
That feature works very well on IBM PPC-6 processors
(IBM calls it "simultaneous multi-threading" SMT, IIRR),
and scales by a factor of >1.5, at least with the atmospheric
model I tried.

This may be a useful way to explore your 24GB, say, by running 12 
processes on a 8-core node (50% oversubscribed),
instead of the 8 processes that you run today on the Barcelonas.

As for compiler flags, if you are using Intel these are probably good:

-wS (which gives you SSE4, but check if there is something fancier now 
for Nehalem)
-fast, although some of our codes had problems with the -ipo
that is part of -fast, and I had to reduce it to -ip plus the other
bits and pieces of -fast.

I hope this helps,
Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

Rahul Nabar wrote:
> Is it a bad mistake to configure a Nehalem (2 sockets quad core giving
> a total of 8 cores; E5520) with 16 GB RAM (4 DIMMs of 4GB each)? I
> know (I think) that the optimized memory for Nehalems is in banks of 6
> due to the way the architecture is? I have often seen Nehalems coming
> with 24 GB memory as 6 DIMMs of 4 GB each.
> Our code requirements dictate 2 GB / core is enough. Should I be
> paying for the additional RAM to make it 24 GB?
> Also, are there any other tips for the Nehalems in general to coax out
> max performance? Maybe some compiler flags or BIOS settings etc? The
> only thing I did so far was to put the BIOS power setting into a "max
> performance" mode.
> In the past I've gotten about  5% additional performance by changing
> the power profile to "performance" using cpu-freq-set on my AMD
> Opteron Barcelonas. Any similar gotchas for the Nehalems and HPC?

More information about the Beowulf mailing list