[Beowulf] [External] numad?

Prentice Bisbal pbisbal at pppl.gov
Wed Jan 19 15:51:28 UTC 2022


It all depends on the characteristics of the job. If the job is 
CPU-bound, then yes, spreading the job across the cores will improve 
performance. If memory access patterns are limiting the performance, 
than numad should help.

Prentice

On 1/19/22 10:44 AM, Jonathan Aquilina via Beowulf wrote:
> Question that comes to mind for me is what will happen to performance 
> on a normal user workstation wouldnt performance improve as work is 
> being spread across the cpu cores?
>
> Regards,
> Jonathan
> ------------------------------------------------------------------------
> *From:* Beowulf <beowulf-bounces at beowulf.org> on behalf of Michael Di 
> Domenico <mdidomenico4 at gmail.com>
> *Sent:* 19 January 2022 13:52
> *Cc:* Beowulf Mailing List <beowulf at beowulf.org>
> *Subject:* Re: [Beowulf] [External] numad?
> yes i agree, numad definitely seems like something that's handy for
> workstations where processes are short lived and sprayed all over the
> place.  i'll probably yank it from all my systems.  it's odd that i've
> never noticed the performance impact before, perhaps something changed
> in the code
>
> On Tue, Jan 18, 2022 at 4:50 PM Prentice Bisbal via Beowulf
> <beowulf at beowulf.org> wrote:
> >
> > Just to add to my earlier comment below. I think numad is something
> > that's really meant for non-HPC environments where latency-hiding is
> > more important than all-out performance. Kinda like hyperthreading - on
> > HPC workloads, it provides marginal improvement at best, but is very
> > helpful on non-HPC workloads (or so I've been told - I have no firsthand
> > professional experience with hyperthreading)
> >
> > Prentice
> >
> > On 1/18/22 2:56 PM, Prentice Bisbal wrote:
> > > Mike,
> > >
> > > I turn it off. When I had it on, it would cause performance to tank.
> > > Doing some basic analysis, it appeared numad would move all the work
> > > to a single core, leaving all the others idle. Without knowing the
> > > inner workings of numad, my guess is that it saw the processes
> > > accessing the same region of memory, so moved all the processes to the
> > > core "closest" to that memory.
> > >
> > > I didn't do any in-depth analysis, but turning off numad definitely
> > > fixed that problem. The problem first appeared with a user code, and I
> > > was able to reproduce it with HPL. It took 10 - 20 minutes for numad
> > > to start migrating processes to the same core, so smaller "test" jobs
> > > didn't trigger the behavior, causing first attempts at reproducing it
> > > were unsuccessful. It wasn't until I ran "full" HPL tests on a node
> > > that I was to reproduce the problem.
> > >
> > > I think I used turbostat or something like that to watch the load
> > > and/or processor freqs on the individual cores.
> > >
> > > Prentice
> > >
> > > On 1/18/22 1:18 PM, Michael Di Domenico wrote:
> > >> does anyone turn-on/off numad on their clusters?  I'm running RHEL7.9
> > >> on Intel CPU's and seeing a heavy performance impact on MPI jobs when
> > >> running numad.
> > >>
> > >> diagnosis is pretty prelim right now, so i'm light on details. when
> > >> running numad i'm seeing MPI jobs stall while numad pokes at the job.
> > >> the stall is notable, like 10-12 seconds
> > >>
> > >> it's particularly interesting because if one rank stalls while numad
> > >> runs, the others wait.  once it frees they all continue, but then
> > >> another rank gets hit, so i end up seeing this cyclic stall
> > >>
> > >> like i said i'm still looking into things, but i curious what
> > >> everyone's take on numa is.  my consensus is we probably don't even
> > >> really need it since slurm/openmpi should be handling process
> > >> placement anyhow
> > >>
> > >> thoughts?
> > >> _______________________________________________
> > >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
> Computing
> > >> To change your subscription (digest mode or unsubscribe) visit
> > >> 
> https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit 
> https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list,Beowulf at beowulf.org  sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20220119/e33de194/attachment.htm>


More information about the Beowulf mailing list