[Beowulf] performance tweaks and optimum memory configs for a Nehalem

Håkon Bugge h-bugge at online.no
Tue Aug 11 00:43:03 PDT 2009

On Aug 10, 2009, at 23:07 , Tom Elken wrote:
> Summary:
> IBM, SGI and Platform have some comparisons on clusters with "SMT  
> On" of running 1 rank for every core compared to running 2 ranks on  
> every core.  In general, on low core-counts, like up to 32 there is  
> about an 8% advantage for running 2 ranks per core.  At larger core  
> counts, IBM published a pair of results on 64 cores where the 64- 
> rank performance was equal to the 128-rank performance.  Not all of  
> these applications scale linearly, so on some of them you lose  
> efficiency at 128 ranks compared to 64 ranks.
> Details: Results from this year are mostly on Nehalem:
> http://www.spec.org/mpi2007/results/res2009q3/ (IBM)
> http://www.spec.org/mpi2007/results/res2009q2/ (Platform)
> http://www.spec.org/mpi2007/results/res2009q1/ (SGI)
>  (Intel has results with Turbo mode turned on and off
>    in the q2 and q3 results, for a different comparison)
> Or you can pick out the Xeon 'X5570' and 'X5560' results from the  
> list of all results:
> http://www.spec.org/mpi2007/results/mpi2007.html
> In the result index, when
> " Compute Threads Enabled" = 2x "Compute Cores Enabled", then you  
> know SMT is turned on.
> In these cases, you can then check that when
> " MPI Ranks" = " Compute Threads Enabled" then you are running 2  
> ranks per core.


Thanks for the neatly compiled information above. I can just add, that  
I have conducted a fairly detailed analysis of Nehalem compared to  
HarperTown in my paper An evaluation of Intel’s core i7 architecture  
using a comparative approach presented at ISC´09. Here, I look at  
different aspect of the memory hierarchy of the two processors. The  
benefits from hyperthreading on the said 13 SPEC MPI2007 applications  
are also studied, although using only a single node, where the  
advantage is more pronounced



