[Beowulf] [External] numad?
Ryan Novosielski
novosirj at rutgers.edu
Wed Jan 19 15:50:29 UTC 2022
He tries to rearrange processes such that their memory access will be local to the local NUMA domain. I have not seen it stack one core, but I suppose it depends how many cores per NUMA node you have.
There are tuning options too. Probably not appropriate for HPC in most cases, but also not one of those things everyone should just strip out.
Sent from my iPhone
On Jan 19, 2022, at 10:44, Jonathan Aquilina via Beowulf <beowulf at beowulf.org> wrote:
Question that comes to mind for me is what will happen to performance on a normal user workstation wouldnt performance improve as work is being spread across the cpu cores?
Regards,
Jonathan
________________________________
From: Beowulf <beowulf-bounces at beowulf.org> on behalf of Michael Di Domenico <mdidomenico4 at gmail.com>
Sent: 19 January 2022 13:52
Cc: Beowulf Mailing List <beowulf at beowulf.org>
Subject: Re: [Beowulf] [External] numad?
yes i agree, numad definitely seems like something that's handy for
workstations where processes are short lived and sprayed all over the
place. i'll probably yank it from all my systems. it's odd that i've
never noticed the performance impact before, perhaps something changed
in the code
On Tue, Jan 18, 2022 at 4:50 PM Prentice Bisbal via Beowulf
<beowulf at beowulf.org> wrote:
>
> Just to add to my earlier comment below. I think numad is something
> that's really meant for non-HPC environments where latency-hiding is
> more important than all-out performance. Kinda like hyperthreading - on
> HPC workloads, it provides marginal improvement at best, but is very
> helpful on non-HPC workloads (or so I've been told - I have no firsthand
> professional experience with hyperthreading)
>
> Prentice
>
> On 1/18/22 2:56 PM, Prentice Bisbal wrote:
> > Mike,
> >
> > I turn it off. When I had it on, it would cause performance to tank.
> > Doing some basic analysis, it appeared numad would move all the work
> > to a single core, leaving all the others idle. Without knowing the
> > inner workings of numad, my guess is that it saw the processes
> > accessing the same region of memory, so moved all the processes to the
> > core "closest" to that memory.
> >
> > I didn't do any in-depth analysis, but turning off numad definitely
> > fixed that problem. The problem first appeared with a user code, and I
> > was able to reproduce it with HPL. It took 10 - 20 minutes for numad
> > to start migrating processes to the same core, so smaller "test" jobs
> > didn't trigger the behavior, causing first attempts at reproducing it
> > were unsuccessful. It wasn't until I ran "full" HPL tests on a node
> > that I was to reproduce the problem.
> >
> > I think I used turbostat or something like that to watch the load
> > and/or processor freqs on the individual cores.
> >
> > Prentice
> >
> > On 1/18/22 1:18 PM, Michael Di Domenico wrote:
> >> does anyone turn-on/off numad on their clusters? I'm running RHEL7.9
> >> on Intel CPU's and seeing a heavy performance impact on MPI jobs when
> >> running numad.
> >>
> >> diagnosis is pretty prelim right now, so i'm light on details. when
> >> running numad i'm seeing MPI jobs stall while numad pokes at the job.
> >> the stall is notable, like 10-12 seconds
> >>
> >> it's particularly interesting because if one rank stalls while numad
> >> runs, the others wait. once it frees they all continue, but then
> >> another rank gets hit, so i end up seeing this cyclic stall
> >>
> >> like i said i'm still looking into things, but i curious what
> >> everyone's take on numa is. my consensus is we probably don't even
> >> really need it since slurm/openmpi should be handling process
> >> placement anyhow
> >>
> >> thoughts?
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20220119/fdda321a/attachment-0001.htm>
More information about the Beowulf
mailing list