<div dir="ltr">All, thankyou very much for looking at this and Happy Christmas when it comes.<div>And a Guid New Year tae ye a.</div><div><br></div><div>Number of DIMMS doubled, all channels now populated. Et voila:</div><div><br></div><div>(HT is on, COD is enabled)</div><div><br></div><div><div>[root@comp007 ~]# numactl --hardware</div><div>available: 4 nodes (0-3)</div><div>node 0 cpus: 0 1 2 3 4 5 24 25 26 27 28 29</div><div>node 0 size: 32673 MB</div><div>node 0 free: 31712 MB</div><div>node 1 cpus: 6 7 8 9 10 11 30 31 32 33 34 35</div><div>node 1 size: 32768 MB</div><div>node 1 free: 31926 MB</div><div>node 2 cpus: 12 13 14 15 16 17 36 37 38 39 40 41</div><div>node 2 size: 32768 MB</div><div>node 2 free: 31972 MB</div><div>node 3 cpus: 18 19 20 21 22 23 42 43 44 45 46 47</div><div>node 3 size: 32768 MB</div><div>node 3 free: 31953 MB</div><div>node distances:</div><div>node 0 1 2 3</div><div> 0: 10 11 21 21</div><div> 1: 11 10 21 21</div><div> 2: 21 21 10 11</div><div> 3: 21 21 11 10</div></div><div><br></div><div><br></div><div><div>[root@comp007 ~]# lscpu</div><div>Architecture: x86_64</div><div>CPU op-mode(s): 32-bit, 64-bit</div><div>Byte Order: Little Endian</div><div>CPU(s): 48</div><div>On-line CPU(s) list: 0-47</div><div>Thread(s) per core: 2</div><div>Core(s) per socket: 12</div><div>Socket(s): 2</div><div>NUMA node(s): 4</div><div>Vendor ID: GenuineIntel</div><div>CPU family: 6</div><div>Model: 79</div><div>Model name: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz</div><div>Stepping: 1</div><div>CPU MHz: 1200.289</div><div>BogoMIPS: 4396.61</div><div>Virtualization: VT-x</div><div>L1d cache: 32K</div><div>L1i cache: 32K</div><div>L2 cache: 256K</div><div>L3 cache: 15360K</div><div>NUMA node0 CPU(s): 0-5,24-29</div><div>NUMA node1 CPU(s): 6-11,30-35</div><div>NUMA node2 CPU(s): 12-17,36-41</div><div>NUMA node3 CPU(s): 18-23,42-47</div></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 19 December 2016 at 03:26, John Hearns <span dir="ltr"><<a href="mailto:hearnsj@googlemail.com" target="_blank">hearnsj@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Brice, thankyou for the reply. You have the answer - these systems have two DIMMS per socket, channels 0 and, so all channels are not populated.<div><br></div><div>I had the lstopo output and the tarball all ready for the OpenMPI list too! Shoudl have sent it over there.</div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On 18 December 2016 at 21:28, Brice Goglin <span dir="ltr"><<a href="mailto:brice.goglin@gmail.com" target="_blank">brice.goglin@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello<br>
Do you know if all your CPU memory channels are populated? CoD requires<br>
that each half of the CPU has some memory DIMMs (so that each NUMA node<br>
actually contains some memory). If both channels of one half are empty,<br>
the NUMA node might somehow disappear.<br>
<span class="m_7589805721908537127HOEnZb"><font color="#888888">Brice<br>
</font></span><div class="m_7589805721908537127HOEnZb"><div class="m_7589805721908537127h5"><br>
<br>
<br>
<br>
Le 16/12/2016 23:26, Elken, Tom a écrit :<br>
> Hi John and Greg,<br>
><br>
> You showed Nodes 0 & 2 (no node 1) and a strange CPU assignment to nodes!<br>
> Even though you had Cluster On Die (CoD) Endabled in your BIOS, I have never seen that arrangement of Numa nodes and CPUs. You may have a bug in your BIOS or OS ?<br>
> With CoD enabled, I would have expected 4 NUMA nodes, 0-3, and 6 cores assigned to each one.<br>
><br>
> The Omni-Path Performance Tuning User Guide<br>
> <a href="http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o/fabric-products/Intel_OP_Performance_Tuning_UG_H93143_v6_0.pdf" rel="noreferrer" target="_blank">http://www.intel.com/content/d<wbr>am/support/us/en/documents/net<wbr>work-and-i-o/fabric-products/<wbr>Intel_OP_Performance_Tuning_<wbr>UG_H93143_v6_0.pdf</a><br>
> does recommend Disabling CoD in Xeon BIOSes (Table 2 on P. 12), but it's not considered a hard prohibition.<br>
> Disabling improves some fabric performance benchmarks, but Enabling helps some single-node applications performance, which could outweigh the fabric performance aspects.<br>
><br>
> -Tom<br>
><br>
>> -----Original Message-----<br>
>> From: Beowulf [mailto:<a href="mailto:beowulf-bounces@beowulf.org" target="_blank">beowulf-bounces@beowul<wbr>f.org</a>] On Behalf Of Greg<br>
>> Lindahl<br>
>> Sent: Friday, December 16, 2016 2:00 PM<br>
>> To: John Hearns<br>
>> Cc: Beowulf Mailing List<br>
>> Subject: Re: [Beowulf] NUMA zone weirdness<br>
>><br>
>> Wow, that's pretty obscure!<br>
>><br>
>> I'd recommend reporting it to Intel so that they can add it to the<br>
>> descendants of ipath_checkout / ipath_debug. It's exactly the kind of<br>
>> hidden gotcha that leads to unhappy systems!<br>
>><br>
>> -- greg<br>
>><br>
>> On Fri, Dec 16, 2016 at 03:52:34PM +0000, John Hearns wrote:<br>
>>> Problem solved.<br>
>>> I have changed the QPI Snoop Mode on these servers from<br>
>>> ClusterOnDIe Enabled to Disabled and they display what I take to be correct<br>
>>> behaviour - ie<br>
>>><br>
>>> [root@comp006 ~]# numactl --hardware<br>
>>> available: 2 nodes (0-1)<br>
>>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11<br>
>>> node 0 size: 32673 MB<br>
>>> node 0 free: 31541 MB<br>
>>> node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23<br>
>>> node 1 size: 32768 MB<br>
>>> node 1 free: 31860 MB<br>
>>> node distances:<br>
>>> node 0 1<br>
>>> 0: 10 21<br>
>>> 1: 21 10<br>
>> ______________________________<wbr>_________________<br>
>> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
>> To change your subscription (digest mode or unsubscribe) visit<br>
>> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
> ______________________________<wbr>_________________<br>
> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
> To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
<br>
______________________________<wbr>_________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>