[Beowulf] numactl and load balancing
mathog
mathog at caltech.edu
Thu Jul 23 12:03:17 PDT 2015
Dell with 2CPU x 12core x 2 threads, shows up in procinfo as 48 cpus.
Trying to run 30 processes 1 each on different "CPU"s by starting them
one at a time with
numactl -C 1-30 /$PATH/program #args...
when 30 have started the script spins waiting for one to exit then
another is started. "top" is showing some of these are running at 50%
CPU, so they are being started on a CPU which already has a job going.
I can see where that would happen, since there doesn't seem to be
anything in numactl about load balancing. The thing is, these processes
are _staying_ on the same CPU, never migrating to another. That I don't
understand. I would have thought numactl sets some mask on the process
restricting the CPUs it can move to, but would not otherwise affect it,
so the OS should migrate it when it sees this situation. In practice it
seems to leave it running on whichever CPU it starts on. Or does linux
not migrate processes when they are heavily loading a single CPU, only
when they run out of memory???
Also "perf top" shows 81% for the program and 13% for numactl.
The goal here is to carefully divvy up the load so that exactly 15 jobs
run on each Numa zone, since then the data in all the inner loops will
fit within the 30M of L3 cache on each CPU. If it puts 17 on one and 13
the inner loop data won't fit and performance slows down dramatically.
Looks like I need to keep track of which job is running where and
numactl lock it to that node. (I don't think there is a queue system on
this machine at present.)
Thanks,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the Beowulf
mailing list