[Beowulf] Your thoughts on use of NUMA-based systems in clusters?
Stuart Midgley
sdm900 at gmail.com
Thu Sep 21 16:16:41 PDT 2006
Afternoon
I was involved in the design, procurement and initial setup of a 1936
cpu SGI Altix 3700Bx2 cluster based on 32p nodes with numa-link as
the shared memory and cluster interconnect.
The machine mostly works as expected and you can treat it as a
standard beowulf cluster... except for the queue and scheduler
software. Your scheduler really needs to be numa-aware (knows about
the topology of your interconnect or shared memory within the node
and tries to keep jobs processes as close as possible) and with such
large cluster nodes, it also needs to be able to use use cpu-sets to
lock down MPI threads to specific cpu/mem sets. Without this,
threads move, pages get sprayed all over memory and performance goes
out the window.
We were lucky, one of my colleagues maintains a heavily modified
OpenPBS, which is numa-aware, and another rewrote SGI's mpirun to
place MPI processes into cpu sets. This means that users get
excellent performance and reliable run times, which is important in
the environment because they are expected to request the walltime
that their jobs will run for.
Stu.
On 21/09/2006, at 22:59, Clements, Brent M ((SAIC)) wrote:
> Out of my own curiosity, would those of you that have delt with
> current/next generation intel based NUMA systems give me your
> opinions on why/why not you would buy or use them as a cluster node.
>
> I'm looking for primarily technical opinions.
>
> Thanks!
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
--
Dr Stuart Midgley
sdm900 at gmail.com
More information about the Beowulf
mailing list