[Beowulf] [External] Re: HPCG benchmark, again

Tue Mar 22 14:35:13 UTC 2022

Thanks for the explanation. I've always found the documentation on HPCG 
to be lacking, and what I remember reading about it said it's supposed 
to be a more holistic approach to benchmarking which I assumed meant it 
stressed the whole system, not just one subsystem.

I'll do a search for presentations from the BOFs. If you can send me the 
PDF you referenced below, I will be grateful.

Prentice

On 3/21/22 8:42 PM, Massimiliano Fatica wrote:
> No, HPCG  is all memory bandwidth.
> You can see this old presentation where GPUs with basically no double 
> precision, perform on par with others with 10x performance.
>
> http://www.hpcg-benchmark.org/downloads/sc14/HPCG_BOF.pdf
>
> There were more examples during recent HPCG BOFs ( but I can't find 
> the pdf online, if you want I can send them to you).
> For example, if you look at the specs of a K80 ( 2xGK210 , 1.4TF DP 
> and 384 bit memory bus  at 5GHz ) and M40 (GM200, 0.2TF DP and 384 bit 
> memory bus  at 6GHz), you may think that the K80 will much 
> faster. Exactly the opposite, and the results scale perfectly with 
> memory bandwidth.
>
> *1 x K80 (2 GK210 GPUs), ECC enabled, clk=875*
> 2x1x1 process grid
> 256x256x256 local domain
> SpMV = 49.1 GF ( 309.1 GB/s Effective) 24.5 GF_per ( 154.6 GB/s 
> Effective) SymGS = 62.2 GF ( 480.2 GB/s Effective) 31.1 GF_per ( 240.1 
> GB/s Effective) total = 58.7 GF ( 445.3 GB/s Effective) 29.4 GF_per ( 
> 222.7 GB/s Effective) final = 55.1 GF ( 417.5 GB/s Effective) 27.5 
> GF_per ( 208.8 GB/s Effective)
>
> *2 x M40 (2 GM200 GPUs), ECC enabled, clk=1114*
> 2x1x1 process grid
> 256x256x256 local domain
> SpMV = 69.4 GF ( 437.2 GB/s Effective) 34.7 GF_per ( 218.6 GB/s 
> Effective) SymGS = 83.7 GF ( 645.7 GB/s Effective) 41.8 GF_per ( 322.8 
> GB/s Effective) total = 79.6 GF ( 603.7 GB/s Effective) 39.8 GF_per ( 
> 301.9 GB/s Effective) final = 74.2 GF ( 562.7 GB/s Effective) 37.1 
> GF_per ( 281.4 GB/s Effective)
>
> Regarding Linpack, on CPU systems  the trailing matrix update is slow, 
> you can hide all the network traffic with the look-ahead if you have a 
> decent network (most CPU-only systems on the list are not real  HPC 
> systems, just some OEMs stuffing the list with cloud systems with very 
> poor network).
> On accelerated systems ( for example GPU), network becomes really 
> critical.
>
> Now, memory bw is the real limitation in most HPC workloads, so if I 
> had to select a system, I would care more about memory bw than HPL.
>
> M
>
>
> On Mon, Mar 21, 2022 at 11:51 AM Prentice Bisbal via Beowulf 
> <beowulf at beowulf.org> wrote:
>
>     M,
>
>     Isn't it more accurate to say that HPCG measures the whole system
>     more realistically, and memory bandwidth happens to be the "rate
>     limiting step" in just about all architectures? Even with LINPACK,
>     which should be CPU-bound, the Top500 list shows that HPL results
>     are affected by the network. For example, there's this article
>     which is a bit old, but I think still applies (doing the same
>     analysis on the current top500 list is on my to-do list, actually):
>
>     https://www.nextplatform.com/2015/07/20/ethernet-will-have-to-work-harder-to-win-hpc/
>
>     On 3/18/22 8:34 PM, Massimiliano Fatica wrote:
>>     HPCG measures memory bandwidth, the FLOPS capability of the chip
>>     is completely irrelevant.
>>     Pretty much all the vendor implementations reach very similar
>>     efficiency if you compare them to the available memory bandwidth.
>>     There is some effect of the network at scale, but you need to
>>     have a really large  system to see it in play.
>>
>>     M
>>
>>     On Fri, Mar 18, 2022 at 5:20 PM Brian Dobbins
>>     <bdobbins at gmail.com> wrote:
>>
>>
>>         Hi Jorg,
>>
>>           We (NCAR - weather/climate applications) tend to find that
>>         HPCG more closely tracks the performance we see from hardware
>>         than Linpack, so it definitely is of interest and watched,
>>         but our procurements tend to use actual code that vendors run
>>         as part of the process, so we don't 'just' use published HPCG
>>         numbers.  Still, I'd say it's still very much a useful
>>         number, though.
>>
>>           As one example, while I haven't seen HPCG numbers for the
>>         MI250x accelerators, Prof. Matuoka of RIKEN tweeted back in
>>         November that he anticipated that to score around 0.4% of
>>         peak on HPCG, vs 2% on the NVIDIA A100 (while the A64FX they
>>         use hits an impressive 3%):
>>         https://twitter.com/ProfMatsuoka/status/1458159517590384640
>>
>>           Why is that relevant?  Well, /on paper/, the MI250X has ~96
>>         TF FP64 w/ Matrix operations, vs 19.5 TF on the A100.  So, 5x
>>         in theory, but Prof Matsuoka anticipated a ~5x differential
>>         in HPCG, /erasing/ that differential.  Now, surely /someone/
>>         has HPCG numbers on the MI250X, but I've not yet seen any. 
>>         Would love to know what they are.  But absent that
>>         information I tend to bet Matsuoka isn't far off the mark.
>>
>>           Ultimately, it may help knowing more about what kind of
>>         applications you run - for memory bound CFD-like codes, HPCG
>>         tends to be pretty representative.
>>
>>           Maybe it's time to update the saying that 'numbers never
>>         lie' to something more accurate - 'numbers never lie, but
>>         they also rarely tell the whole story'.
>>
>>           Cheers,
>>           - Brian
>>
>>
>>         On Fri, Mar 18, 2022 at 5:08 PM Jörg Saßmannshausen
>>         <sassy-work at sassy.formativ.net> wrote:
>>
>>             Dear all,
>>
>>             further the emails back in 2020 around the HPCG benchmark
>>             test, as we are in
>>             the process of getting a new cluster I was wondering if
>>             somebody else in the
>>             meantime has used that test to benchmark the particular
>>             performance of the
>>             cluster.
>>             From what I can see, the latest HPCG version is 3.1 from
>>             August 2019. I also
>>             have noticed that their website has a link to download a
>>             version which
>>             includes the latest A100 GPUs from nVidia.
>>             https://www.hpcg-benchmark.org/software/view.html?id=280
>>
>>             What I was wondering is: has anybody else apart from
>>             Prentice tried that test
>>             and is it somehow useful, or does it just give you
>>             another set of numbers?
>>
>>             Our new cluster will not be at the same league as the
>>             supercomputers, but we
>>             would like to have at least some kind of handle so we can
>>             compare the various
>>             offers from vendors. My hunch is the benchmark will
>>             somehow (strongly?) depend
>>             on how it is tuned. As my former colleague used to say: I
>>             am looking for some
>>             war stories (not very apt to say these days!).
>>
>>             Either way, I hope you are all well given the strange new
>>             world we are living
>>             in right now.
>>
>>             All the best from a spring like dark London
>>
>>             Jörg
>>
>>
>>
>>             _______________________________________________
>>             Beowulf mailing list, Beowulf at beowulf.org sponsored by
>>             Penguin Computing
>>             To change your subscription (digest mode or unsubscribe)
>>             visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>>         _______________________________________________
>>         Beowulf mailing list, Beowulf at beowulf.org sponsored by
>>         Penguin Computing
>>         To change your subscription (digest mode or unsubscribe)
>>         visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>>
>>     _______________________________________________
>>     Beowulf mailing list,Beowulf at beowulf.org  sponsored by Penguin Computing
>>     To change your subscription (digest mode or unsubscribe) visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>     Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20220322/9625329d/attachment.htm>