<div dir="ltr"><div dir="ltr">On Mon, Jan 22, 2024 at 11:16 AM Prentice Bisbal <<a href="mailto:pbisbal@pppl.gov">pbisbal@pppl.gov</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

  

    

  

  <div>

    <blockquote type="cite"><div dir="ltr"><div class="gmail_quote"><div><snip> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">>

            Another interesting topic is that nodes are becoming

            many-core - any <br>

            > thoughts? <br>

            <br>

            Core counts are getting too high to be of use in HPC. High

            core-count <br>

            processors sound great until you realize that all those

            cores are now <br>

            competing for same memory bandwidth and network bandwidth,

            neither of <br>

            which increase with core-count.<br>

            <br>

            Last April we were evaluating test systems from different

            vendors for a <br>

            cluster purchase. One of our test users does a lot of CFD

            simulations <br>

            that are very sensitive to mem bandwidth. While he was

            getting a 50% <br>

            speed up in AMD compared to Intel (which makes sense since

            AMDs require <br>

            12 DIMM slots to be filled instead of Intel's 8), he asked

            us consider <br>

            servers with LESS cores. Even with the AMDs, he was

            saturating the <br>

            memory bandwidth before scaling to all the cores, causing

            his <br>

            performance to plateau. For him, buying cheaper processors

            with lower <br>

            core-counts was better for him, since the savings would

            allow us to by <br>

            additional nodes, which would be more beneficial to him.<br>

          </blockquote>

          <div><br>

          </div>

          <div>We see this as well in DOE especially when GPUs are doing

            a significant amount of the work.</div>

        </div>

      </div>

    </blockquote>

    <p>Yeah, I noticed that Frontier and Aurora will actually be

      single-socket systems w/ "only" 64 cores.</p></div></blockquote><div> Yes, Frontier is a <b>single</b> <b>CPU</b> socket and <b>four GPUs</b> (actually eight GPUs from the user's perspective). It works out to eight cores per Graphics Compute Die (GCD). The FLOPS ratio is roughly 1:100 between the CPU and GPUs.</div><div><br></div><div>Note, Aurora is a dual CPU and six GPU. I am not sure if the user sees six or more GPUs. The Aurora node is similar to our Summit node but with more connectivity between the GPUs.</div></div></div>