[Beowulf] AMD and AVX512

Jörg Saßmannshausen sassy-work at sassy.formativ.net
Sun Jun 20 22:45:26 UTC 2021


Dear all,

same here, I should have joined the discussion earlier but currently I am 
recovering from a trapped ulnaris nerve OP, so long typing is something I need 
to avoid.
As it is quite apt I think, I would like to inform you about this upcoming 
talk (copy&pasta):

**********
*Performance Optimizations & Best Practices for AMD Rome and Milan CPUs in HPC 
Environments*
- date & time: Fri July 2nd 2021 - 16:00-17:30 UTC
- speakers: Evan Burness and Jithin Jose (Principal Program Managers for High-
Performance Computing in Microsoft Azure)

More information available at https://github.com/easybuilders/easybuild/wiki/
EasyBuild-tech-talks-IV:-AMD-Rome-&-Milan

The talk will be presented via a Zoom session, which registered attendees can 
join, and will be streamed (+ recorded) via the EasyBuild YouTube channel.
Q&A via the #tech-talks channel in the EasyBuild Slack.

Please register (free or charge) if you plan to attend, via:
https://webappsx.ugent.be/eventManager/events/ebtechtalkamdromemilan
The Zoom link will only be shared with registered attendees.
**********

These talks are really tech talks and not sales talks and all of the ones I 
been to were very informative and friendly. So that might be a good idea to 
ask some questions there?

All the best

Jörg

Am Sonntag, 20. Juni 2021, 18:28:25 BST schrieb Mikhail Kuzminsky:
> I apologize - I should have written earlier, but I don't always work
> with my broken right hand. It seems to me that a reasonable basis for
> discussing AMD EPYC performance could be the specified performance
> data in the Daresburg University benchmark from M.Guest. Yes, newer
> versions of AMD EPYC and Xeon Scalable processors have appeared since
> then, and new compiler versions. However, Intel already had AVX-512
> support, and AMD - AVX-256.
> Of course, peak performanceis is not so important as application
> performance. There are applications where performance is not limited
> to working with vectors - there AVX-512 may not be needed. And in AI
> tasks, working with vectors is actual - and GPUs are often used there.
> For AI, the Daresburg benchmark, on the other hand, is less relevant.
> And in Zen 4, AMD seemed to be going to support 512 bit vectors. But
> performance of linear algebra does not always require work with GPU.
> In quantum chemistry, you can get acceleration due to vectors on the
> V100, let's say a 2 times - how much more expensive is the GPU?
> Of course, support for 512 bit vectors is a plus, but you really need
> to look to application performance and cost (including power
> consumption). I prefer to see to the A64FX now, although there may
> need to be rebuild applications. Servers w/A64FX sold now, but the
> price is very important.
> 
> In message from John Hearns <hearnsj at gmail.com> (Sun, 20 Jun 2021
> 
> 06:38:06 +0100):
> > Regarding benchmarking real world codes on AMD , every year Martyn
> >
> >Guest
> >
> > presents a comprehensive set of benchmark studies to the UK Computing
> > Insights Conference.
> > I suggest a Sunday afternoon with the beverage of your choice is a
> >
> >good
> >
> > time to settle down and take time to read these or watch the
> >
> >presentation.
> >
> > 2019
> > https://www.scd.stfc.ac.uk/SiteAssets/Pages/CIUK-2019-Presentations/Martyn
> > _Guest.pdf
> > 
> > 
> > 2020 Video session
> > https://ukri.zoom.us/rec/share/ajvsxdJ8RM1wzpJtnlcypw4OyrZ9J27nqsfAG7eW49E
> > hq_Z5igat_7gj21Ge8gWu.78Cd9I1DNIjVViPV?startTime=1607008552000
> > 
> > Skylake / Cascade Lake / AMD Rome
> > 
> > The slides for 2020 do exist - as I remember all the slides from all
> >
> >talks
> >
> > are grouped together, but I cannot find them.
> > Watch the video - it is an excellent presentation.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Sat, 19 Jun 2021 at 16:49, Gerald Henriksen <ghenriks at gmail.com>
> >
> >wrote:
> >> On Wed, 16 Jun 2021 13:15:40 -0400, you wrote:
> >> >The answer given, and I'm
> >> >not making this up, is that AMD listens to their users and gives the
> >> >users what they want, and right now they're not hearing any demand
> >>
> >>for
> >>
> >> >AVX512.
> >> >
> >> >Personally, I call BS on that one. I can't imagine anyone in the HPC
> >> >community saying "we'd like processors that offer only 1/2 the
> >>
> >>floating
> >>
> >> >point performance of Intel processors".
> >> 
> >> I suspect that is marketing speak, which roughly translates to not
> >> that no one has asked for it, but rather requests haven't reached a
> >> threshold where the requests are viewed as significant enough.
> >> 
> >> > Sure, AMD can offer more cores,
> >> >
> >> >but with only AVX2, you'd need twice as many cores as Intel
> >>
> >>processors,
> >>
> >> >all other things being equal.
> >> 
> >> But of course all other things aren't equal.
> >> 
> >> AVX512 is a mess.
> >> 
> >> Look at the Wikipedia page(*) and note that AVX512 means different
> >> things depending on the processor implementing it.
> >> 
> >> So what does the poor software developer target?
> >> 
> >> Or that it can for heat reasons cause CPU frequency reductions,
> >> meaning real world performance may not match theoritical - thus
> >>
> >>easier
> >>
> >> to just go with GPU's.
> >> 
> >> The result is that most of the world is quite happily (at least for
> >> now) ignoring AVX512 and going with GPU's as necessary - particularly
> >> given the convenient libraries that Nvidia offers.
> >> 
> >> > I compared a server with dual AMD EPYC >7H12 processors (128)
> >> > quad Intel Xeon 8268 >processors (96 cores).
> >> > 
> >> > From what I've heard, the AMD processors run much hotter than the
> >>
> >>Intel
> >>
> >> >processors, too, so I imagine a FLOPS/Watt comparison would be even
> >>
> >>less
> >>
> >> >favorable to AMD.
> >> 
> >> Spec sheets would indicate AMD runs hotter, but then again you
> >> benchmarked twice as many Intel processors.
> >> 
> >> So, per spec sheets for you processors above:
> >> 
> >> AMD - 280W - 2 processors means system 560W
> >> Intel - 205W - 4 processors means system 820W
> >> 
> >> (and then you also need to factor in purchase price).
> >> 
> >> >An argument can be made that for calculations that lend themselves
> >>
> >>to
> >>
> >> >vectorization should be done on GPUs, instead of the main processors
> >>
> >>but
> >>
> >> >the last time I checked, GPU jobs are still memory is limited, and
> >> >moving data in and out of GPU memory can still take time, so I can
> >>
> >>see
> >>
> >> >situations where for large amounts of data using CPUs would be
> >>
> >>preferred
> >>
> >> >over GPUs.
> >> 
> >> AMD's latest chips support PCI 4 while Intel is still stuck on PCI 3,
> >> which may or may not mean a difference.
> >> 
> >> But what despite all of the above and the other replies, it is AMD
> >>
> >>who
> >>
> >> has been winning the HPC contracts of late, not Intel.
> >> 
> >> * - https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> >>
> >>Computing
> >>
> >> To change your subscription (digest mode or unsubscribe) visit
> >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf





More information about the Beowulf mailing list