[Beowulf] [External] Re: HPCG benchmark, again
Prentice Bisbal
pbisbal at pppl.gov
Tue Mar 22 14:35:13 UTC 2022
Thanks for the explanation. I've always found the documentation on HPCG
to be lacking, and what I remember reading about it said it's supposed
to be a more holistic approach to benchmarking which I assumed meant it
stressed the whole system, not just one subsystem.
I'll do a search for presentations from the BOFs. If you can send me the
PDF you referenced below, I will be grateful.
Prentice
On 3/21/22 8:42 PM, Massimiliano Fatica wrote:
> No, HPCG is all memory bandwidth.
> You can see this old presentation where GPUs with basically no double
> precision, perform on par with others with 10x performance.
>
> http://www.hpcg-benchmark.org/downloads/sc14/HPCG_BOF.pdf
>
> There were more examples during recent HPCG BOFs ( but I can't find
> the pdf online, if you want I can send them to you).
> For example, if you look at the specs of a K80 ( 2xGK210 , 1.4TF DP
> and 384 bit memory bus at 5GHz ) and M40 (GM200, 0.2TF DP and 384 bit
> memory bus at 6GHz), you may think that the K80 will much
> faster. Exactly the opposite, and the results scale perfectly with
> memory bandwidth.
>
> *1 x K80 (2 GK210 GPUs), ECC enabled, clk=875*
> 2x1x1 process grid
> 256x256x256 local domain
> SpMV = 49.1 GF ( 309.1 GB/s Effective) 24.5 GF_per ( 154.6 GB/s
> Effective) SymGS = 62.2 GF ( 480.2 GB/s Effective) 31.1 GF_per ( 240.1
> GB/s Effective) total = 58.7 GF ( 445.3 GB/s Effective) 29.4 GF_per (
> 222.7 GB/s Effective) final = 55.1 GF ( 417.5 GB/s Effective) 27.5
> GF_per ( 208.8 GB/s Effective)
>
> *2 x M40 (2 GM200 GPUs), ECC enabled, clk=1114*
> 2x1x1 process grid
> 256x256x256 local domain
> SpMV = 69.4 GF ( 437.2 GB/s Effective) 34.7 GF_per ( 218.6 GB/s
> Effective) SymGS = 83.7 GF ( 645.7 GB/s Effective) 41.8 GF_per ( 322.8
> GB/s Effective) total = 79.6 GF ( 603.7 GB/s Effective) 39.8 GF_per (
> 301.9 GB/s Effective) final = 74.2 GF ( 562.7 GB/s Effective) 37.1
> GF_per ( 281.4 GB/s Effective)
>
> Regarding Linpack, on CPU systems the trailing matrix update is slow,
> you can hide all the network traffic with the look-ahead if you have a
> decent network (most CPU-only systems on the list are not real HPC
> systems, just some OEMs stuffing the list with cloud systems with very
> poor network).
> On accelerated systems ( for example GPU), network becomes really
> critical.
>
> Now, memory bw is the real limitation in most HPC workloads, so if I
> had to select a system, I would care more about memory bw than HPL.
>
> M
>
>
> On Mon, Mar 21, 2022 at 11:51 AM Prentice Bisbal via Beowulf
> <beowulf at beowulf.org> wrote:
>
> M,
>
> Isn't it more accurate to say that HPCG measures the whole system
> more realistically, and memory bandwidth happens to be the "rate
> limiting step" in just about all architectures? Even with LINPACK,
> which should be CPU-bound, the Top500 list shows that HPL results
> are affected by the network. For example, there's this article
> which is a bit old, but I think still applies (doing the same
> analysis on the current top500 list is on my to-do list, actually):
>
> https://www.nextplatform.com/2015/07/20/ethernet-will-have-to-work-harder-to-win-hpc/
>
> On 3/18/22 8:34 PM, Massimiliano Fatica wrote:
>> HPCG measures memory bandwidth, the FLOPS capability of the chip
>> is completely irrelevant.
>> Pretty much all the vendor implementations reach very similar
>> efficiency if you compare them to the available memory bandwidth.
>> There is some effect of the network at scale, but you need to
>> have a really large system to see it in play.
>>
>> M
>>
>> On Fri, Mar 18, 2022 at 5:20 PM Brian Dobbins
>> <bdobbins at gmail.com> wrote:
>>
>>
>> Hi Jorg,
>>
>> We (NCAR - weather/climate applications) tend to find that
>> HPCG more closely tracks the performance we see from hardware
>> than Linpack, so it definitely is of interest and watched,
>> but our procurements tend to use actual code that vendors run
>> as part of the process, so we don't 'just' use published HPCG
>> numbers. Still, I'd say it's still very much a useful
>> number, though.
>>
>> As one example, while I haven't seen HPCG numbers for the
>> MI250x accelerators, Prof. Matuoka of RIKEN tweeted back in
>> November that he anticipated that to score around 0.4% of
>> peak on HPCG, vs 2% on the NVIDIA A100 (while the A64FX they
>> use hits an impressive 3%):
>> https://twitter.com/ProfMatsuoka/status/1458159517590384640
>>
>> Why is that relevant? Well, /on paper/, the MI250X has ~96
>> TF FP64 w/ Matrix operations, vs 19.5 TF on the A100. So, 5x
>> in theory, but Prof Matsuoka anticipated a ~5x differential
>> in HPCG, /erasing/ that differential. Now, surely /someone/
>> has HPCG numbers on the MI250X, but I've not yet seen any.
>> Would love to know what they are. But absent that
>> information I tend to bet Matsuoka isn't far off the mark.
>>
>> Ultimately, it may help knowing more about what kind of
>> applications you run - for memory bound CFD-like codes, HPCG
>> tends to be pretty representative.
>>
>> Maybe it's time to update the saying that 'numbers never
>> lie' to something more accurate - 'numbers never lie, but
>> they also rarely tell the whole story'.
>>
>> Cheers,
>> - Brian
>>
>>
>> On Fri, Mar 18, 2022 at 5:08 PM Jörg Saßmannshausen
>> <sassy-work at sassy.formativ.net> wrote:
>>
>> Dear all,
>>
>> further the emails back in 2020 around the HPCG benchmark
>> test, as we are in
>> the process of getting a new cluster I was wondering if
>> somebody else in the
>> meantime has used that test to benchmark the particular
>> performance of the
>> cluster.
>> From what I can see, the latest HPCG version is 3.1 from
>> August 2019. I also
>> have noticed that their website has a link to download a
>> version which
>> includes the latest A100 GPUs from nVidia.
>> https://www.hpcg-benchmark.org/software/view.html?id=280
>>
>> What I was wondering is: has anybody else apart from
>> Prentice tried that test
>> and is it somehow useful, or does it just give you
>> another set of numbers?
>>
>> Our new cluster will not be at the same league as the
>> supercomputers, but we
>> would like to have at least some kind of handle so we can
>> compare the various
>> offers from vendors. My hunch is the benchmark will
>> somehow (strongly?) depend
>> on how it is tuned. As my former colleague used to say: I
>> am looking for some
>> war stories (not very apt to say these days!).
>>
>> Either way, I hope you are all well given the strange new
>> world we are living
>> in right now.
>>
>> All the best from a spring like dark London
>>
>> Jörg
>>
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by
>> Penguin Computing
>> To change your subscription (digest mode or unsubscribe)
>> visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by
>> Penguin Computing
>> To change your subscription (digest mode or unsubscribe)
>> visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>>
>> _______________________________________________
>> Beowulf mailing list,Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20220322/9625329d/attachment.htm>
More information about the Beowulf
mailing list