[Beowulf] Again about NUMA (numactl and taskset)
Håkon Bugge
Hakon.Bugge at scali.com
Tue Jun 24 03:15:11 PDT 2008
At 07:56 24.06.2008, Chris Samuel wrote:
> > Your MPI (and OpenMP) should do this for you.
>
>Although not always correctly, it may assume that it can
>allocate from core 0 onwards leading to odd performance
>issues if you happen to get two 4 CPU jobs running on the
>same node..
If the MPI is doing that it is real stupid.
Physical core IDs are physical ones. Users and
applications should stay away from binding to
physical resources. They can be used by others!
These physical IDs varies greatly, and it has
even been observed different enumeration on the
same mothbd between boots. Imagine how to operate
a cluster of these when users enumerate these IDs specifically ;-)
IMHO, the MPI should virtualize these resources
and relieve the end-user/application programmer
from the burden. Here's one example using Scali
MPI Connect running two jobs, each with four MPI
processes on a dual-socket, quad-core Barcelona system:
First job:
(submon-1 at barcelona-1) Affinity 'automatic'
policy BANDWIDTH granularity CORE nprocs 4
(submon-1 at barcelona-1) Will bind process 0 with
mask=0000000000000001 [(board=0, socket=0, core=0, execunit=0)]
(submon-1 at barcelona-1) Will bind process 1 with
mask=0000000000010000 [(board=0, socket=1, core=0, execunit=0)]
(submon-1 at barcelona-1) Will bind process 2 with
mask=0000000000000010 [(board=0, socket=0, core=1, execunit=0)]
(submon-1 at barcelona-1) Will bind process 3 with
mask=0000000000100000 [(board=0, socket=1, core=1, execunit=0)]
Second job:
(submon-1 at barcelona-1) Affinity 'automatic'
policy BANDWIDTH granularity CORE nprocs 4
(submon-1 at barcelona-1) Will bind process 0 with
mask=0000000000000100 [(board=0, socket=0, core=2, execunit=0)]
(submon-1 at barcelona-1) Will bind process 1 with
mask=0000000001000000 [(board=0, socket=1, core=2, execunit=0)]
(submon-1 at barcelona-1) Will bind process 2 with
mask=0000000000001000 [(board=0, socket=0, core=3, execunit=0)]
(submon-1 at barcelona-1) Will bind process 3 with
mask=0000000010000000 [(board=0, socket=1, core=3, execunit=0)]
and, this is with default settings. Other
policies and 'resolutions' as we call it can be applied.
Håkon
More information about the Beowulf
mailing list