<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>It all depends on the characteristics of the job. If the job is
      CPU-bound, then yes, spreading the job across the cores will
      improve performance. If memory access patterns are limiting the
      performance, than numad should help. <br>
    </p>
    <pre class="moz-signature" cols="72">Prentice</pre>
    <div class="moz-cite-prefix">On 1/19/22 10:44 AM, Jonathan Aquilina
      via Beowulf wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:PR3PR08MB583522F4F4292C958F41A95BA0599@PR3PR08MB5835.eurprd08.prod.outlook.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;
        font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255,
        255, 255);">
        Question that comes to mind for me is what will happen to
        performance on a normal user workstation wouldnt performance
        improve as work is being spread across the cpu cores?</div>
      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;
        font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255,
        255, 255);">
        <br>
      </div>
      <div style="font-family: Calibri, Arial, Helvetica, sans-serif;
        font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255,
        255, 255);">
        Regards,<br>
        Jonathan</div>
      <hr style="display:inline-block;width:98%" tabindex="-1">
      <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
          face="Calibri, sans-serif" color="#000000"><b>From:</b>
          Beowulf <a class="moz-txt-link-rfc2396E" href="mailto:beowulf-bounces@beowulf.org"><beowulf-bounces@beowulf.org></a> on behalf of
          Michael Di Domenico <a class="moz-txt-link-rfc2396E" href="mailto:mdidomenico4@gmail.com"><mdidomenico4@gmail.com></a><br>
          <b>Sent:</b> 19 January 2022 13:52<br>
          <b>Cc:</b> Beowulf Mailing List <a class="moz-txt-link-rfc2396E" href="mailto:beowulf@beowulf.org"><beowulf@beowulf.org></a><br>
          <b>Subject:</b> Re: [Beowulf] [External] numad?</font>
        <div> </div>
      </div>
      <div class="BodyFragment"><font size="2"><span
            style="font-size:11pt;">
            <div class="PlainText">yes i agree, numad definitely seems
              like something that's handy for<br>
              workstations where processes are short lived and sprayed
              all over the<br>
              place.  i'll probably yank it from all my systems.  it's
              odd that i've<br>
              never noticed the performance impact before, perhaps
              something changed<br>
              in the code<br>
              <br>
              On Tue, Jan 18, 2022 at 4:50 PM Prentice Bisbal via
              Beowulf<br>
              <a class="moz-txt-link-rfc2396E" href="mailto:beowulf@beowulf.org"><beowulf@beowulf.org></a> wrote:<br>
              ><br>
              > Just to add to my earlier comment below. I think
              numad is something<br>
              > that's really meant for non-HPC environments where
              latency-hiding is<br>
              > more important than all-out performance. Kinda like
              hyperthreading - on<br>
              > HPC workloads, it provides marginal improvement at
              best, but is very<br>
              > helpful on non-HPC workloads (or so I've been told -
              I have no firsthand<br>
              > professional experience with hyperthreading)<br>
              ><br>
              > Prentice<br>
              ><br>
              > On 1/18/22 2:56 PM, Prentice Bisbal wrote:<br>
              > > Mike,<br>
              > ><br>
              > > I turn it off. When I had it on, it would cause
              performance to tank.<br>
              > > Doing some basic analysis, it appeared numad
              would move all the work<br>
              > > to a single core, leaving all the others idle.
              Without knowing the<br>
              > > inner workings of numad, my guess is that it saw
              the processes<br>
              > > accessing the same region of memory, so moved
              all the processes to the<br>
              > > core "closest" to that memory.<br>
              > ><br>
              > > I didn't do any in-depth analysis, but turning
              off numad definitely<br>
              > > fixed that problem. The problem first appeared
              with a user code, and I<br>
              > > was able to reproduce it with HPL. It took 10 -
              20 minutes for numad<br>
              > > to start migrating processes to the same core,
              so smaller "test" jobs<br>
              > > didn't trigger the behavior, causing first
              attempts at reproducing it<br>
              > > were unsuccessful. It wasn't until I ran "full"
              HPL tests on a node<br>
              > > that I was to reproduce the problem.<br>
              > ><br>
              > > I think I used turbostat or something like that
              to watch the load<br>
              > > and/or processor freqs on the individual cores.<br>
              > ><br>
              > > Prentice<br>
              > ><br>
              > > On 1/18/22 1:18 PM, Michael Di Domenico wrote:<br>
              > >> does anyone turn-on/off numad on their
              clusters?  I'm running RHEL7.9<br>
              > >> on Intel CPU's and seeing a heavy
              performance impact on MPI jobs when<br>
              > >> running numad.<br>
              > >><br>
              > >> diagnosis is pretty prelim right now, so i'm
              light on details. when<br>
              > >> running numad i'm seeing MPI jobs stall
              while numad pokes at the job.<br>
              > >> the stall is notable, like 10-12 seconds<br>
              > >><br>
              > >> it's particularly interesting because if one
              rank stalls while numad<br>
              > >> runs, the others wait.  once it frees they
              all continue, but then<br>
              > >> another rank gets hit, so i end up seeing
              this cyclic stall<br>
              > >><br>
              > >> like i said i'm still looking into things,
              but i curious what<br>
              > >> everyone's take on numa is.  my consensus is
              we probably don't even<br>
              > >> really need it since slurm/openmpi should be
              handling process<br>
              > >> placement anyhow<br>
              > >><br>
              > >> thoughts?<br>
              > >>
              _______________________________________________<br>
              > >> Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a>
              sponsored by Penguin Computing<br>
              > >> To change your subscription (digest mode or
              unsubscribe) visit<br>
              > >> <a
href="https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf"
                moz-do-not-send="true" class="moz-txt-link-freetext">
https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
              > _______________________________________________<br>
              > Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored
              by Penguin Computing<br>
              > To change your subscription (digest mode or
              unsubscribe) visit <a
href="https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf"
                moz-do-not-send="true" class="moz-txt-link-freetext">
https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
              _______________________________________________<br>
              Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by
              Penguin Computing<br>
              To change your subscription (digest mode or unsubscribe)
              visit <a
href="https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf"
                moz-do-not-send="true" class="moz-txt-link-freetext">
https://link.edgepilot.com/s/eacf38c2/ztPy8P5_oUeVfwHQ4YYBZg?u=https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
            </div>
          </span></font></div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit <a class="moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a>
</pre>
    </blockquote>
  </body>
</html>