[Beowulf] Best case performance of HPL on EPYC 7742 processor ...

Richard Walsh rbwcnslt at gmail.com
Fri Aug 14 14:29:02 PDT 2020


What have people achieved on this SKU on a single-node using the stock
HPL 2.3 source... ??

I have seen a variety of performance claims even as high as 90% of its
per node peak of 4.608 TFLOPs.  I can now get above 80% of peak, but not
I have heard that to get higher values special BIOS settings are required,
the turning off SMT which allows the chip to turbo higher.  Remember this
is not the
7542 processor with 32 cores per chip and the same bandwidth per socket as
7742 which can turbo to over 100% of nominal peak for HPL.

If people have gotten higher single node numbers ... what is your recipe
... ??

I am particularly interested in BIOS settings, and maybe surprise settings
in the HPL.dat file.  Do higher performing runs require using close to the
maximum memory on the node ... ??  As this is single-node, I would not
expect choice of MPI to make a difference

To get to 80% with SMT on in the BIOS, I am building with an older Intel
compiler and MKL that still recognizes the MKL_DEBUG_CPU_TYPE=5.
Running so that the number of MPI ranks run on the node matches the
number of CCXs seems ot give the best numbers.

Following the tuning instructions from AMD for using BLIS and GCC for
the build does not get me there.


