[Beowulf] NUMA zone weirdness

Brice Goglin brice.goglin at gmail.com
Sun Dec 18 13:28:17 PST 2016


Hello
Do you know if all your CPU memory channels are populated? CoD requires
that each half of the CPU has some memory DIMMs (so that each NUMA node
actually contains some memory). If both channels of one half are empty,
the NUMA node might somehow disappear.
Brice




Le 16/12/2016 23:26, Elken, Tom a écrit :
> Hi John and Greg,
>
> You showed  Nodes 0 & 2 (no node 1) and a strange CPU assignment to nodes!  
> Even though you had Cluster On Die (CoD) Endabled in your BIOS, I have never seen that arrangement of Numa nodes and CPUs.  You may have a bug in your BIOS or OS ?  
> With CoD enabled, I would have expected 4 NUMA nodes, 0-3, and 6 cores assigned to each one.
>
> The Omni-Path Performance Tuning User Guide 
> http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Performance_Tuning_UG_H93143_v6_0.pdf 
> does recommend Disabling CoD in Xeon BIOSes  (Table 2 on P. 12), but it's not considered a hard prohibition.
> Disabling improves some fabric performance benchmarks, but Enabling helps some single-node applications performance, which could outweigh the fabric performance aspects.
>
> -Tom
>
>> -----Original Message-----
>> From: Beowulf [mailto:beowulf-bounces at beowulf.org] On Behalf Of Greg
>> Lindahl
>> Sent: Friday, December 16, 2016 2:00 PM
>> To: John Hearns
>> Cc: Beowulf Mailing List
>> Subject: Re: [Beowulf] NUMA zone weirdness
>>
>> Wow, that's pretty obscure!
>>
>> I'd recommend reporting it to Intel so that they can add it to the
>> descendants of ipath_checkout / ipath_debug. It's exactly the kind of
>> hidden gotcha that leads to unhappy systems!
>>
>> -- greg
>>
>> On Fri, Dec 16, 2016 at 03:52:34PM +0000, John Hearns wrote:
>>> Problem solved.
>>> I have changed the QPI Snoop Mode on these servers from
>>> ClusterOnDIe Enabled to Disabled and they display what I take to be correct
>>> behaviour - ie
>>>
>>> [root at comp006 ~]# numactl --hardware
>>> available: 2 nodes (0-1)
>>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
>>> node 0 size: 32673 MB
>>> node 0 free: 31541 MB
>>> node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23
>>> node 1 size: 32768 MB
>>> node 1 free: 31860 MB
>>> node distances:
>>> node   0   1
>>>   0:  10  21
>>>   1:  21  10
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list