[Beowulf] numad?
Michael Di Domenico
mdidomenico4 at gmail.com
Tue Jan 18 18:18:01 UTC 2022
does anyone turn-on/off numad on their clusters? I'm running RHEL7.9
on Intel CPU's and seeing a heavy performance impact on MPI jobs when
running numad.
diagnosis is pretty prelim right now, so i'm light on details. when
running numad i'm seeing MPI jobs stall while numad pokes at the job.
the stall is notable, like 10-12 seconds
it's particularly interesting because if one rank stalls while numad
runs, the others wait. once it frees they all continue, but then
another rank gets hit, so i end up seeing this cyclic stall
like i said i'm still looking into things, but i curious what
everyone's take on numa is. my consensus is we probably don't even
really need it since slurm/openmpi should be handling process
placement anyhow
thoughts?
More information about the Beowulf
mailing list