<div dir="ltr">This article might be interesting here:<div><br></div><div><a href="https://www.dell.com/support/article/en-uk/sln319015/amd-rome-is-it-for-real-architecture-and-initial-hpc-performance?lang=en">https://www.dell.com/support/article/en-uk/sln319015/amd-rome-is-it-for-real-architecture-and-initial-hpc-performance?lang=en</a><br></div><div><br></div><div>And Hello Joshua. Long time no see.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 25 Oct 2020 at 23:11, Joshua Mora <<a href="mailto:joshua_mora@usa.net">joshua_mora@usa.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Reach out AMD,<br>
they have specific instructions (including BIOS/OS settings) and even binaries<br>
on how to get the best performance.<br>
Dont go try and error as is very time consuming.<br>
BLIS has also multiple parameters as it has nested loops, so you could also<br>
have to try multiple configurations to get the optimal performance.<br>
Just reach to them.<br>
<br>
Joshua<br>
<br>
------ Original Message ------<br>
Received: 04:30 PM CDT, 08/14/2020<br>
From: Richard Walsh <<a href="mailto:rbwcnslt@gmail.com" target="_blank">rbwcnslt@gmail.com</a>><br>
To: Beowulf List <<a href="mailto:beowulf@beowulf.org" target="_blank">beowulf@beowulf.org</a>><br>
Subject: [Beowulf] Best case performance of HPL on EPYC 7742 processor ...<br>
<br>
> All,<br>
> <br>
> What have people achieved on this SKU on a single-node using the stock<br>
> HPL 2.3 source... ??<br>
> <br>
> I have seen a variety of performance claims even as high as 90% of its<br>
> nominal<br>
> per node peak of 4.608 TFLOPs. I can now get above 80% of peak, but not<br>
> higher.<br>
> I have heard that to get higher values special BIOS settings are required,<br>
> including<br>
> the turning off SMT which allows the chip to turbo higher. Remember this<br>
> is not the<br>
> 7542 processor with 32 cores per chip and the same bandwidth per socket as<br>
> the<br>
> 7742 which can turbo to over 100% of nominal peak for HPL.<br>
> <br>
> If people have gotten higher single node numbers ... what is your recipe<br>
> ... ??<br>
> <br>
> I am particularly interested in BIOS settings, and maybe surprise settings<br>
> in the HPL.dat file. Do higher performing runs require using close to the<br>
> maximum memory on the node ... ?? As this is single-node, I would not<br>
> expect choice of MPI to make a difference<br>
> <br>
> To get to 80% with SMT on in the BIOS, I am building with an older Intel<br>
> compiler and MKL that still recognizes the MKL_DEBUG_CPU_TYPE=5.<br>
> Running so that the number of MPI ranks run on the node matches the<br>
> number of CCXs seems ot give the best numbers.<br>
> <br>
> Following the tuning instructions from AMD for using BLIS and GCC for<br>
> the build does not get me there.<br>
> <br>
> Thanks,<br>
> <br>
> Richard Walsh<br>
> <br>
<br>
> _______________________________________________<br>
> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
> To change your subscription (digest mode or unsubscribe) visit<br>
<a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
<br>
<br>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote></div>