[Beowulf] performance tweaks and optimum memory configs for a Nehalem
Tom Elken
tom.elken at qlogic.com
Mon Aug 10 14:07:23 PDT 2009
> Well, as there are only 8 "real" cores, running a computationally
> intensive process across 16 should *definitely* do worse than across 8.
Not typically.
At the SPEC website there are quite a few SPEC MPI2007 (which is an average across 13 HPC applications) results on Nehalem.
Summary:
IBM, SGI and Platform have some comparisons on clusters with "SMT On" of running 1 rank for every core compared to running 2 ranks on every core. In general, on low core-counts, like up to 32 there is about an 8% advantage for running 2 ranks per core. At larger core counts, IBM published a pair of results on 64 cores where the 64-rank performance was equal to the 128-rank performance. Not all of these applications scale linearly, so on some of them you lose efficiency at 128 ranks compared to 64 ranks.
Details: Results from this year are mostly on Nehalem:
http://www.spec.org/mpi2007/results/res2009q3/ (IBM)
http://www.spec.org/mpi2007/results/res2009q2/ (Platform)
http://www.spec.org/mpi2007/results/res2009q1/ (SGI)
(Intel has results with Turbo mode turned on and off
in the q2 and q3 results, for a different comparison)
Or you can pick out the Xeon 'X5570' and 'X5560' results from the list of all results:
http://www.spec.org/mpi2007/results/mpi2007.html
In the result index, when
" Compute Threads Enabled" = 2x "Compute Cores Enabled", then you know SMT is turned on.
In these cases, you can then check that when
" MPI Ranks" = " Compute Threads Enabled" then you are running 2 ranks per core.
-Tom
> However, it's not so surprising that you're seeing peak performance
> with
> 2-4 threads. Nehalem can actually overclock itself when only some of
> the
> cores are busy -- it's called Turbo Mode. That *could* be what you're
> seeing.
>
> --
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
More information about the Beowulf
mailing list