[Beowulf] Best case performance of HPL on EPYC 7742 processor ...
e.scott.atchley at gmail.com
Mon Aug 17 08:07:16 PDT 2020
I do not have any specific HPL hints.
I would suggest setting the BIOS to NUMAs-Per-Socket to 4 (NSP-4). I would
try running 16 processes, one per CCX - two per CCD, with an OpenMP depth
Dell's HPC blog has a few articles on tuning Rome:
On Fri, Aug 14, 2020 at 5:30 PM Richard Walsh <rbwcnslt at gmail.com> wrote:
> What have people achieved on this SKU on a single-node using the stock
> HPL 2.3 source... ??
> I have seen a variety of performance claims even as high as 90% of its
> per node peak of 4.608 TFLOPs. I can now get above 80% of peak, but not
> I have heard that to get higher values special BIOS settings are required,
> the turning off SMT which allows the chip to turbo higher. Remember this
> is not the
> 7542 processor with 32 cores per chip and the same bandwidth per socket as
> 7742 which can turbo to over 100% of nominal peak for HPL.
> If people have gotten higher single node numbers ... what is your recipe
> ... ??
> I am particularly interested in BIOS settings, and maybe surprise settings
> in the HPL.dat file. Do higher performing runs require using close to the
> maximum memory on the node ... ?? As this is single-node, I would not
> expect choice of MPI to make a difference
> To get to 80% with SMT on in the BIOS, I am building with an older Intel
> compiler and MKL that still recognizes the MKL_DEBUG_CPU_TYPE=5.
> Running so that the number of MPI ranks run on the node matches the
> number of CCXs seems ot give the best numbers.
> Following the tuning instructions from AMD for using BLIS and GCC for
> the build does not get me there.
> Richard Walsh
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf