[Beowulf] NUMA zone weirdness
hearnsj at googlemail.com
Sun Dec 18 19:26:01 PST 2016
Brice, thankyou for the reply. You have the answer - these systems have two
DIMMS per socket, channels 0 and, so all channels are not populated.
I had the lstopo output and the tarball all ready for the OpenMPI list too!
Shoudl have sent it over there.
On 18 December 2016 at 21:28, Brice Goglin <brice.goglin at gmail.com> wrote:
> Do you know if all your CPU memory channels are populated? CoD requires
> that each half of the CPU has some memory DIMMs (so that each NUMA node
> actually contains some memory). If both channels of one half are empty,
> the NUMA node might somehow disappear.
> Le 16/12/2016 23:26, Elken, Tom a écrit :
> > Hi John and Greg,
> > You showed Nodes 0 & 2 (no node 1) and a strange CPU assignment to
> > Even though you had Cluster On Die (CoD) Endabled in your BIOS, I have
> never seen that arrangement of Numa nodes and CPUs. You may have a bug in
> your BIOS or OS ?
> > With CoD enabled, I would have expected 4 NUMA nodes, 0-3, and 6 cores
> assigned to each one.
> > The Omni-Path Performance Tuning User Guide
> > http://www.intel.com/content/dam/support/us/en/documents/
> > does recommend Disabling CoD in Xeon BIOSes (Table 2 on P. 12), but
> it's not considered a hard prohibition.
> > Disabling improves some fabric performance benchmarks, but Enabling
> helps some single-node applications performance, which could outweigh the
> fabric performance aspects.
> > -Tom
> >> -----Original Message-----
> >> From: Beowulf [mailto:beowulf-bounces at beowulf.org] On Behalf Of Greg
> >> Lindahl
> >> Sent: Friday, December 16, 2016 2:00 PM
> >> To: John Hearns
> >> Cc: Beowulf Mailing List
> >> Subject: Re: [Beowulf] NUMA zone weirdness
> >> Wow, that's pretty obscure!
> >> I'd recommend reporting it to Intel so that they can add it to the
> >> descendants of ipath_checkout / ipath_debug. It's exactly the kind of
> >> hidden gotcha that leads to unhappy systems!
> >> -- greg
> >> On Fri, Dec 16, 2016 at 03:52:34PM +0000, John Hearns wrote:
> >>> Problem solved.
> >>> I have changed the QPI Snoop Mode on these servers from
> >>> ClusterOnDIe Enabled to Disabled and they display what I take to be
> >>> behaviour - ie
> >>> [root at comp006 ~]# numactl --hardware
> >>> available: 2 nodes (0-1)
> >>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
> >>> node 0 size: 32673 MB
> >>> node 0 free: 31541 MB
> >>> node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23
> >>> node 1 size: 32768 MB
> >>> node 1 free: 31860 MB
> >>> node distances:
> >>> node 0 1
> >>> 0: 10 21
> >>> 1: 21 10
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> >> To change your subscription (digest mode or unsubscribe) visit
> >> http://www.beowulf.org/mailman/listinfo/beowulf
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf