[Beowulf] Your thoughts on use of NUMA-based systems in clusters?

Stuart Midgley sdm900 at gmail.com
Thu Sep 21 16:16:41 PDT 2006


I was involved in the design, procurement and initial setup of a 1936  
cpu SGI Altix 3700Bx2 cluster based on 32p nodes with numa-link as  
the shared memory and cluster interconnect.

The machine mostly works as expected and you can treat it as a  
standard beowulf cluster... except for the queue and scheduler  
software.  Your scheduler really needs to be numa-aware (knows about  
the topology of your interconnect or shared memory within the node  
and tries to keep jobs processes as close as possible) and with such  
large cluster nodes, it also needs to be able to use use cpu-sets to  
lock down MPI threads to specific cpu/mem sets.  Without this,  
threads move, pages get sprayed all over memory and performance goes  
out the window.

We were lucky, one of my colleagues maintains a heavily modified  
OpenPBS, which is numa-aware, and another rewrote SGI's mpirun to  
place MPI processes into cpu sets.  This means that users get  
excellent performance and reliable run times, which is important in  
the environment because they are expected to request the walltime  
that their jobs will run for.


On 21/09/2006, at 22:59, Clements, Brent M ((SAIC)) wrote:

> Out of my own curiosity, would those of you that have delt with  
> current/next generation intel based NUMA systems give me your  
> opinions on why/why not you would buy or use them as a cluster node.
> I'm looking for primarily technical opinions.
> Thanks!
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

Dr Stuart Midgley
sdm900 at gmail.com

More information about the Beowulf mailing list