<div dir="ltr">No, HPCG  is all memory bandwidth. <div>You can see this old presentation where GPUs with basically no double precision, perform on par with others with 10x performance.</div><div><br></div><div><a href="http://www.hpcg-benchmark.org/downloads/sc14/HPCG_BOF.pdf">http://www.hpcg-benchmark.org/downloads/sc14/HPCG_BOF.pdf</a><br></div><div><br></div><div><div class="gmail-page" title="Page 7"><div class="gmail-section" style="color:rgb(0,0,0)">There were more examples during recent HPCG BOFs ( but I can't find the pdf online, if you want I can send them to you).</div><div class="gmail-section" style="color:rgb(0,0,0)">For example, if you look at the specs of a K80 ( 2xGK210 , <span style="color:rgb(34,34,34)">1.4TF DP and 384 bit memory bus  at 5GHz</span> ) and M40 (GM200, 0.2TF DP and <span style="color:rgb(34,34,34)">384 bit memory bus  at 6GHz), you may think that the K80 will much faster.</span> Exactly the opposite, and the results scale perfectly with memory bandwidth.</div><div class="gmail-section" style="color:rgb(0,0,0)"><br></div><b>1 x K80 (2 GK210 GPUs), ECC enabled, clk=875</b><br>2x1x1 process grid<br>256x256x256 local domain<br>SpMV = 49.1 GF ( 309.1 GB/s Effective) 24.5 GF_per ( 154.6 GB/s Effective) SymGS = 62.2 GF ( 480.2 GB/s Effective) 31.1 GF_per ( 240.1 GB/s Effective) total = 58.7 GF ( 445.3 GB/s Effective) 29.4 GF_per ( 222.7 GB/s Effective) final = 55.1 GF ( 417.5 GB/s Effective) 27.5 GF_per ( 208.8 GB/s Effective)<br><br><b>2 x M40 (2 GM200 GPUs), ECC enabled, clk=1114</b><br>2x1x1 process grid<br>256x256x256 local domain<br>SpMV = 69.4 GF ( 437.2 GB/s Effective) 34.7 GF_per ( 218.6 GB/s Effective) SymGS = 83.7 GF ( 645.7 GB/s Effective) 41.8 GF_per ( 322.8 GB/s Effective) total = 79.6 GF ( 603.7 GB/s Effective) 39.8 GF_per ( 301.9 GB/s Effective) final = 74.2 GF ( 562.7 GB/s Effective) 37.1 GF_per ( 281.4 GB/s Effective)</div><div class="gmail-page" title="Page 7"><br></div><div class="gmail-page" title="Page 7"><div class="gmail-section" style="color:rgb(0,0,0)">Regarding Linpack, on CPU systems  the trailing matrix update is slow, you can hide all the network traffic with the look-ahead if you have a decent network (most CPU-only systems on the list are not real  HPC systems, just some OEMs stuffing the list with cloud systems with very poor network).</div><div class="gmail-section" style="color:rgb(0,0,0)">On accelerated systems ( for example GPU), network becomes really critical.</div><div class="gmail-section" style="color:rgb(0,0,0)"><br></div><div class="gmail-section" style="color:rgb(0,0,0)">Now, memory bw is the real limitation in most HPC workloads, so if I had to select a system, I would care more about memory bw than HPL.</div><div class="gmail-section" style="color:rgb(0,0,0)"><br></div><div class="gmail-section" style="color:rgb(0,0,0)">M</div></div></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 21, 2022 at 11:51 AM Prentice Bisbal via Beowulf <<a href="mailto:beowulf@beowulf.org">beowulf@beowulf.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

  <div>

    <p>M, <br>

    </p>

    <p>Isn't it more accurate to say that HPCG measures the whole system

      more realistically, and memory bandwidth happens to be the "rate

      limiting step" in just about all architectures? Even with LINPACK,

      which should be CPU-bound, the Top500 list shows that HPL results

      are affected by the network. For example, there's this article

      which is a bit old, but I think still applies (doing the same

      analysis on the current top500 list is on my to-do list,

      actually): <br>

    </p>

    <p><a href="https://www.nextplatform.com/2015/07/20/ethernet-will-have-to-work-harder-to-win-hpc/" target="_blank">https://www.nextplatform.com/2015/07/20/ethernet-will-have-to-work-harder-to-win-hpc/</a><br>

    </p>

    <pre cols="72"></pre>

    <div>On 3/18/22 8:34 PM, Massimiliano Fatica

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">HPCG measures memory bandwidth, the FLOPS

        capability of the chip is completely irrelevant.

        <div>Pretty much all the vendor implementations reach very

          similar efficiency if you compare them to the available memory

          bandwidth.</div>

        <div>There is some effect of the network at scale, but you need

          to have a really large  system to see it in play.</div>

        <div><br>

        </div>

        <div>M</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Fri, Mar 18, 2022 at 5:20

          PM Brian Dobbins <<a href="mailto:bdobbins@gmail.com" target="_blank">bdobbins@gmail.com</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

          <div dir="ltr">

            <div><br>

            </div>

            <div>Hi Jorg,<br>

            </div>

            <div><br>

            </div>

            <div>  We (NCAR - weather/climate applications) tend to find

              that HPCG more closely tracks the performance we see from

              hardware than Linpack, so it definitely is of interest and

              watched, but our procurements tend to use actual code that

              vendors run as part of the process, so we don't 'just' use

              published HPCG numbers.  Still, I'd say it's still very

              much a useful number, though.</div>

            <div><br>

            </div>

            <div>  As one example, while I haven't seen HPCG numbers for

              the MI250x accelerators, Prof. Matuoka of RIKEN tweeted

              back in November that he anticipated that to score around

              0.4% of peak on HPCG, vs 2% on the NVIDIA A100 (while the

              A64FX they use hits an impressive 3%):</div>

            <div><a href="https://twitter.com/ProfMatsuoka/status/1458159517590384640" target="_blank">https://twitter.com/ProfMatsuoka/status/1458159517590384640</a></div>

            <div><br>

            </div>

            <div>  Why is that relevant?  Well, <i>on paper</i>, the

              MI250X has ~96 TF FP64 w/ Matrix operations, vs 19.5 TF on

              the A100.  So, 5x in theory, but Prof Matsuoka anticipated

              a ~5x differential in HPCG, <i>erasing</i> that

              differential.  Now, surely <i>someone</i> has HPCG

              numbers on the MI250X, but I've not yet seen any.  Would

              love to know what they are.  But absent that information I

              tend to bet Matsuoka isn't far off the mark.</div>

            <div><br>

            </div>

            <div>  Ultimately, it may help knowing more about what kind

              of applications you run - for memory bound CFD-like codes,

              HPCG tends to be pretty representative.  <br>

            </div>

            <div><br>

            </div>

            <div>  Maybe it's time to update the saying that 'numbers

              never lie' to something more accurate - 'numbers never

              lie, but they also rarely tell the whole story'.</div>

            <div><br>

            </div>

            <div>  Cheers,</div>

            <div>  - Brian</div>

            <div><br>

            </div>

          </div>

          <br>

          <div class="gmail_quote">

            <div dir="ltr" class="gmail_attr">On Fri, Mar 18, 2022 at

              5:08 PM Jörg Saßmannshausen <<a href="mailto:sassy-work@sassy.formativ.net" target="_blank">sassy-work@sassy.formativ.net</a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Dear

              all,<br>

              <br>

              further the emails back in 2020 around the HPCG benchmark

              test, as we are in <br>

              the process of getting a new cluster I was wondering if

              somebody else in the <br>

              meantime has used that test to benchmark the particular

              performance of the <br>

              cluster. <br>

              From what I can see, the latest HPCG version is 3.1 from

              August 2019. I also <br>

              have noticed that their website has a link to download a

              version which <br>

              includes the latest A100 GPUs from nVidia. <br>

              <a href="https://www.hpcg-benchmark.org/software/view.html?id=280" rel="noreferrer" target="_blank">https://www.hpcg-benchmark.org/software/view.html?id=280</a><br>

              <br>

              What I was wondering is: has anybody else apart from

              Prentice tried that test <br>

              and is it somehow useful, or does it just give you another

              set of numbers?<br>

              <br>

              Our new cluster will not be at the same league as the

              supercomputers, but we <br>

              would like to have at least some kind of handle so we can

              compare the various <br>

              offers from vendors. My hunch is the benchmark will

              somehow (strongly?) depend <br>

              on how it is tuned. As my former colleague used to say: I

              am looking for some <br>

              war stories (not very apt to say these days!).<br>

              <br>

              Either way, I hope you are all well given the strange new

              world we are living <br>

              in right now.<br>

              <br>

              All the best from a spring like dark London<br>

              <br>

              Jörg<br>

              <br>

              <br>

              <br>

              _______________________________________________<br>

              Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a>

              sponsored by Penguin Computing<br>

              To change your subscription (digest mode or unsubscribe)

              visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>

            </blockquote>

          </div>

          _______________________________________________<br>

          Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a>

          sponsored by Penguin Computing<br>

          To change your subscription (digest mode or unsubscribe) visit

          <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>

        </blockquote>

      </div>

      <br>

      <fieldset></fieldset>

      <pre>_______________________________________________

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing

To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a>

</pre>

    </blockquote>

  </div>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>

</blockquote></div>