[Beowulf] NUMA zone weirdness

John Hearns hearnsj at googlemail.com
Fri Dec 16 06:36:03 PST 2016


This is in the context of Ominpath cards and the hfi1 driver.
In the file pio.c there is a check on the NUMA zones being online



*      num_numa = num_online_nodes
<http://lxr.free-electrons.com/ident?v=4.4;i=num_online_nodes>();*

*1711*
<http://lxr.free-electrons.com/source/drivers/staging/rdma/hfi1/pio.c?v=4.4#L1711>*
/* enforce the expectation that the numas are compact */*

*1712*
<http://lxr.free-electrons.com/source/drivers/staging/rdma/hfi1/pio.c?v=4.4#L1712>*
for (i <http://lxr.free-electrons.com/ident?v=4.4;i=i> = 0; i
<http://lxr.free-electrons.com/ident?v=4.4;i=i> < num_numa; i
<http://lxr.free-electrons.com/ident?v=4.4;i=i>++) {*

*1713*
<http://lxr.free-electrons.com/source/drivers/staging/rdma/hfi1/pio.c?v=4.4#L1713>*
if (!node_online
<http://lxr.free-electrons.com/ident?v=4.4;i=node_online>(i
<http://lxr.free-electrons.com/ident?v=4.4;i=i>)) {*

*1714*
<http://lxr.free-electrons.com/source/drivers/staging/rdma/hfi1/pio.c?v=4.4#L1714>*
dd_dev_err <http://lxr.free-electrons.com/ident?v=4.4;i=dd_dev_err>(dd
<http://lxr.free-electrons.com/ident?v=4.4;i=dd>, "NUMA nodes are not
compact\n");*

*1715*
<http://lxr.free-electrons.com/source/drivers/staging/rdma/hfi1/pio.c?v=4.4#L1715>*
ret <http://lxr.free-electrons.com/ident?v=4.4;i=ret> = -EINVAL
<http://lxr.free-electrons.com/ident?v=4.4;i=EINVAL>;*

*1716*
<http://lxr.free-electrons.com/source/drivers/staging/rdma/hfi1/pio.c?v=4.4#L1716>*
goto done <http://lxr.free-electrons.com/ident?v=4.4;i=done>;*

*1717*
<http://lxr.free-electrons.com/source/drivers/staging/rdma/hfi1/pio.c?v=4.4#L1717>*
}*

*1718*
<http://lxr.free-electrons.com/source/drivers/staging/rdma/hfi1/pio.c?v=4.4#L1718>*
}*




On some servers I have I see this weirdness with the NUMA zones:

(2650-v4 processors, HT is off)

[root at comp006 ~]# numactl --hardware

available: 2 nodes (0,2)

node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 18 19 20 21 22 23

node 0 size: 32673 MB

node 0 free: 29840 MB

node 2 cpus: 12 13 14 15 16 17

node 2 size: 32768 MB

node 2 free: 31753 MB

node distances:

node   0   2

  0:  10  20

  2:  20  10



Someone will be along in a minute to explain why.

I am sure this is a BISO Setting, but which oen is not makign itself clear
to me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20161216/b0f442fc/attachment.html>


More information about the Beowulf mailing list