<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>It all depends on the characteristics of the job. If the job is
CPU-bound, then yes, spreading the job across the cores will
improve performance. If memory access patterns are limiting the
performance, than numad should help. <br>
</p>
<pre class="moz-signature" cols="72">Prentice</pre>
<div class="moz-cite-prefix">On 1/19/22 10:44 AM, Jonathan Aquilina
via Beowulf wrote:<br>
</div>
<blockquote type="cite"
cite="mid:PR3PR08MB583522F4F4292C958F41A95BA0599@PR3PR08MB5835.eurprd08.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255,
255, 255);">
Question that comes to mind for me is what will happen to
performance on a normal user workstation wouldnt performance
improve as work is being spread across the cpu cores?</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255,
255, 255);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255,
255, 255);">
Regards,<br>
Jonathan</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
face="Calibri, sans-serif" color="#000000"><b>From:</b>
Beowulf <a class="moz-txt-link-rfc2396E" href="mailto:beowulf-bounces@beowulf.org"><beowulf-bounces@beowulf.org></a> on behalf of
Michael Di Domenico <a class="moz-txt-link-rfc2396E" href="mailto:mdidomenico4@gmail.com"><mdidomenico4@gmail.com></a><br>
<b>Sent:</b> 19 January 2022 13:52<br>
<b>Cc:</b> Beowulf Mailing List <a class="moz-txt-link-rfc2396E" href="mailto:beowulf@beowulf.org"><beowulf@beowulf.org></a><br>
<b>Subject:</b> Re: [Beowulf] [External] numad?</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span
style="font-size:11pt;">
<div class="PlainText">yes i agree, numad definitely seems
like something that's handy for<br>
workstations where processes are short lived and sprayed
all over the<br>
place. i'll probably yank it from all my systems. it's
odd that i've<br>
never noticed the performance impact before, perhaps
something changed<br>
in the code<br>
<br>
On Tue, Jan 18, 2022 at 4:50 PM Prentice Bisbal via
Beowulf<br>
<a class="moz-txt-link-rfc2396E" href="mailto:beowulf@beowulf.org"><beowulf@beowulf.org></a> wrote:<br>
><br>
> Just to add to my earlier comment below. I think
numad is something<br>
> that's really meant for non-HPC environments where
latency-hiding is<br>
> more important than all-out performance. Kinda like
hyperthreading - on<br>
> HPC workloads, it provides marginal improvement at
best, but is very<br>
> helpful on non-HPC workloads (or so I've been told -
I have no firsthand<br>
> professional experience with hyperthreading)<br>
><br>
> Prentice<br>
><br>
> On 1/18/22 2:56 PM, Prentice Bisbal wrote:<br>
> > Mike,<br>
> ><br>
> > I turn it off. When I had it on, it would cause
performance to tank.<br>
> > Doing some basic analysis, it appeared numad
would move all the work<br>
> > to a single core, leaving all the others idle.
Without knowing the<br>
> > inner workings of numad, my guess is that it saw
the processes<br>
> > accessing the same region of memory, so moved
all the processes to the<br>
> > core "closest" to that memory.<br>
> ><br>
> > I didn't do any in-depth analysis, but turning
off numad definitely<br>
> > fixed that problem. The problem first appeared
with a user code, and I<br>
> > was able to reproduce it with HPL. It took 10 -
20 minutes for numad<br>
> > to start migrating processes to the same core,
so smaller "test" jobs<br>
> > didn't trigger the behavior, causing first
attempts at reproducing it<br>
> > were unsuccessful. It wasn't until I ran "full"
HPL tests on a node<br>
> > that I was to reproduce the problem.<br>
> ><br>
> > I think I used turbostat or something like that
to watch the load<br>
> > and/or processor freqs on the individual cores.<br>
> ><br>
> > Prentice<br>
> ><br>
> > On 1/18/22 1:18 PM, Michael Di Domenico wrote:<br>
> >> does anyone turn-on/off numad on their
clusters? I'm running RHEL7.9<br>
> >> on Intel CPU's and seeing a heavy
performance impact on MPI jobs when<br>
> >> running numad.<br>
> >><br>
> >> diagnosis is pretty prelim right now, so i'm
light on details. when<br>
> >> running numad i'm seeing MPI jobs stall
while numad pokes at the job.<br>
> >> the stall is notable, like 10-12 seconds<br>
> >><br>
> >> it's particularly interesting because if one
rank stalls while numad<br>
> >> runs, the others wait. once it frees they
all continue, but then<br>
> >> another rank gets hit, so i end up seeing
this cyclic stall<br>
> >><br>
> >> like i said i'm still looking into things,
but i curious what<br>
> >> everyone's take on numa is. my consensus is
we probably don't even<br>
> >> really need it since slurm/openmpi should be
handling process<br>
> >> placement anyhow<br>
> >><br>
> >> thoughts?<br>
> >>
_______________________________________________<br>
> >> Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a>
sponsored by Penguin Computing<br>
> >> To change your subscription (digest mode or
unsubscribe) visit<br>
> >> <a
href="https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf"
moz-do-not-send="true" class="moz-txt-link-freetext">
https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
> _______________________________________________<br>
> Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored
by Penguin Computing<br>
> To change your subscription (digest mode or
unsubscribe) visit <a
href="https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf"
moz-do-not-send="true" class="moz-txt-link-freetext">
https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
_______________________________________________<br>
Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by
Penguin Computing<br>
To change your subscription (digest mode or unsubscribe)
visit <a
href="https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf"
moz-do-not-send="true" class="moz-txt-link-freetext">
https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</div>
</span></font></div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit <a class="moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a>
</pre>
</blockquote>
</body>
</html>