<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Scott, <br>

    </p>

    <div class="moz-cite-prefix">On 1/20/24 12:10 PM, Scott Atchley

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAL8g0jJ93QeDBpkWQ+85ZfDxf-U8CY=nnwgNJxuR6XS1O6GM9A@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">On Fri, Jan 19, 2024 at 9:40 PM Prentice Bisbal

          via Beowulf <<a href="mailto:beowulf@beowulf.org"

            moz-do-not-send="true" class="moz-txt-link-freetext">beowulf@beowulf.org</a>>

          wrote:<br>

        </div>

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">>

            Yes, someone is sure to say "don't try characterizing all

            that stuff -<br>

            > it's your application's performance that matters!" 

            Alas, we're a generic<br>

            > "any kind of research computing" organization, so there

            are thousands <br>

            > of apps<br>

            > across all possible domains. <br>

            <br>

            <rant><br>

            <br>

            I agree with you. I've always hated the "it depends on your

            application" <br>

            stock response in HPC. I think it's BS. Very few of us work

            in an <br>

            environment where we support only a handful of applications

            with very <br>

            similar characteristics. I say use standardized benchmarks

            that test <br>

            specific performance metrics (mem bandwidth or mem latency,

            etc.), <br>

            first, and then use a few applications to confirm what

            you're seeing <br>

            with those benchmarks.<br>

            <br>

            </rant><br>

          </blockquote>

          <div><br>

          </div>

          <div>It does depend on the application(s). At OLCF, we have

            hundreds of applications. Some pound the network and some do

            not. Because we are a Leadership Computing Facility, a user

            cannot get any time on the machine unless they can scale to

            20% and ideally to 100% of the system. We have several apps

            with FFTs which become all-to-alls in MPI. Because of this,

            ideally we want a non-blocking fat-tree (i.e., Clos)

            topology. Every other topology is a compromise. That said, a

            full Clos is 2x or more in cost compared to other common

            topologies (e.g., dragonfly or a 2:1 oversubscribed,

            fat-tree). If your workload is small jobs that can fit in a

            rack, for example, then by all means save some money and get

            an oversubscribed fat-tree, dragonfly, etc. If your jobs

            need to use the full machine and they have large message

            collectives, then you have to bite the bullet and spend more

            on network and less on compute and/or storage.</div>

          <div><br>

          </div>

          <div>To assess the usage of our parallel file systems, we run

            with Darshan installed and it captures data from each MPI

            job (each job step within a job). We do not have similar

            tools to determine how the network is being used (e.g., how

            much bandwidth do we need, what communication patterns).

            When I was at Myricom and we were releasing Myri-10G, I

            benchmarked several ISV codes on 2G versus 10G. If I

            remember, Fluent did not benefit from the extra bandwidth,

            but PowerFlow did a lot. </div>

          <div><br>

          </div>

          <div>My point is that "It depends" may not be a satisfying

            answer, but it is realistic.</div>

        </div>

      </div>

    </blockquote>

    <p>I don't disagree with you that different apps stress a cluster in

      different ways. I've seen a lot of that myself. What I'm saying is

      that designing a cluster around only a handful of applications is

      not practical or possible for most clusters, since the same

      cluster will most likely be supporting apps at different ends of

      the spectrum(s). I've had numerous discussions with users who

      don't think IB is worth it because if we by Ethernet we can more

      cores. That may be fine for their embarrassingly parallel

      application, but what about the user with the tightly-coupled MD

      application? <br>

    </p>

    <p>I always recommend going with the best networking you can afford,

      because having better networking won't hurt the apps that don't

      need it, but the apps that DO need it will definitely notice it

      when it's not there. <br>

    </p>

    <p>Like you,I have seen the cost difference in going from

      non-blocking to 2:1 oversubscription. Once you get beyond a couple

      of switches, it becomes significantly more money to go from 2:1 to

      non-blocking. When going from 2:1 to 3:1, though, the savings

      isn't really as much (at least for the cluster sized I've spec'ed

      out), so it doesn't seem worth it go from 2:1 to 3:1. Going

      non-blocking within a rack and going with oversubscription between

      racks (like SDSC did with the Comet cluster) isn't that bad an

      idea if budget is an issue. <br>

    </p>

    <p><br>

    </p>

    <blockquote type="cite"

cite="mid:CAL8g0jJ93QeDBpkWQ+85ZfDxf-U8CY=nnwgNJxuR6XS1O6GM9A@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">>

            Another interesting topic is that nodes are becoming

            many-core - any <br>

            > thoughts? <br>

            <br>

            Core counts are getting too high to be of use in HPC. High

            core-count <br>

            processors sound great until you realize that all those

            cores are now <br>

            competing for same memory bandwidth and network bandwidth,

            neither of <br>

            which increase with core-count.<br>

            <br>

            Last April we were evaluating test systems from different

            vendors for a <br>

            cluster purchase. One of our test users does a lot of CFD

            simulations <br>

            that are very sensitive to mem bandwidth. While he was

            getting a 50% <br>

            speed up in AMD compared to Intel (which makes sense since

            AMDs require <br>

            12 DIMM slots to be filled instead of Intel's 8), he asked

            us consider <br>

            servers with LESS cores. Even with the AMDs, he was

            saturating the <br>

            memory bandwidth before scaling to all the cores, causing

            his <br>

            performance to plateau. For him, buying cheaper processors

            with lower <br>

            core-counts was better for him, since the savings would

            allow us to by <br>

            additional nodes, which would be more beneficial to him.<br>

          </blockquote>

          <div><br>

          </div>

          <div>We see this as well in DOE especially when GPUs are doing

            a significant amount of the work.</div>

        </div>

      </div>

    </blockquote>

    <p>Yeah, I noticed that Frontier and Aurora will actually be

      single-socket systems w/ "only" 64 cores. <br>

    </p>

    <blockquote type="cite"

cite="mid:CAL8g0jJ93QeDBpkWQ+85ZfDxf-U8CY=nnwgNJxuR6XS1O6GM9A@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div><br>

          </div>

          <div>Scott</div>

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><snip><br>

            --<br>

            Prentice<br>

            <br>

            <br>

            On 1/16/24 5:19 PM, Mark Hahn wrote:<br>

            > Hi all,<br>

            > Just wondering if any of you have numbers (or

            experience) with<br>

            > modern high-speed COTS ethernet.<br>

            ><br>

            > Latency mainly, but perhaps also message rate.  Also

            ease of use<br>

            > with open-source products like OpenMPI, maybe Lustre?<br>

            > Flexibility in configuring clusters in the >= 1k

            node range?<br>

            ><br>

            > We have a good idea of what to expect from Infiniband

            offerings,<br>

            > and are familiar with scalable network topologies.<br>

            > But vendors seem to think that high-end ethernet

            (100-400Gb) is <br>

            > competitive...<br>

            ><br>

            > For instance, here's an excellent study of Cray/HP

            Slingshot (non-COTS):<br>

            > <a href="https://arxiv.org/pdf/2008.08886.pdf"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://arxiv.org/pdf/2008.08886.pdf</a><br>

            > (half rtt around 2 us, but this paper has great stuff

            about <br>

            > congestion, etc)<br>

            ><br>

            > Yes, someone is sure to say "don't try characterizing

            all that stuff -<br>

            > it's your application's performance that matters!" 

            Alas, we're a generic<br>

            > "any kind of research computing" organization, so there

            are thousands <br>

            > of apps<br>

            > across all possible domains.<br>

            ><br>

            > Another interesting topic is that nodes are becoming

            many-core - any <br>

            > thoughts?<br>

            ><br>

            > Alternatively, are there other places to ask? Reddit or

            something less <br>

            > "greybeard"?<br>

            ><br>

            > thanks, mark hahn<br>

            > McMaster U / SharcNET / ComputeOntario / DRI Alliance

            Canada<br>

            ><br>

            > PS: the snarky name "NVidiband" just occurred to me;

            too soon?<br>

            > _______________________________________________<br>

            > Beowulf mailing list, <a

              href="mailto:Beowulf@beowulf.org" target="_blank"

              moz-do-not-send="true" class="moz-txt-link-freetext">Beowulf@beowulf.org</a>

            sponsored by Penguin Computing<br>

            > To change your subscription (digest mode or

            unsubscribe) visit <br>

            > <a

              href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>

            _______________________________________________<br>

            Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org"

              target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">Beowulf@beowulf.org</a>

            sponsored by Penguin Computing<br>

            To change your subscription (digest mode or unsubscribe)

            visit <a

              href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>

          </blockquote>

        </div>

      </div>

    </blockquote>

  </body>

</html>