<div dir="auto">Cool, thanks! </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 3, 2021, 6:25 PM Carlos Bederián <<a href="mailto:carlos.bederian@unc.edu.ar">carlos.bederian@unc.edu.ar</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The Top500 has been listing wrong Rpeak values for most clusters for many years now, so I wouldn't dwell on it...<div><br></div><div>Take a Skylake-based cluster like Frontera. Its listed Rpeak is 38,745.9 TFLOPS = 8008 nodes * 56 cores * 32 ops/cycle * 2.7GHz.</div><div>But 2.7GHz is the regular base frequency, and to do 32 ops/cycle you need to use AVX-512. All-core AVX-512 frequencies for a Xeon 8280 are 1.8GHz base and 2.4GHz turbo, so the Rpeak is off by 12-33%.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 3, 2021 at 9:22 AM harsh_google lastname <<a href="mailto:harshscience777@gmail.com" target="_blank" rel="noreferrer">harshscience777@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">But that wouls bring the theoretical performance to 160 TFLOPS per box, which also doesn't match!</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 3, 2021, 5:50 PM Carlos Bederián <<a href="mailto:carlos.bederian@unc.edu.ar" target="_blank" rel="noreferrer">carlos.bederian@unc.edu.ar</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">A100 does 19.5 FP64 TFLOPS using tensor cores.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 3, 2021 at 9:08 AM harsh_google lastname <<a href="mailto:harshscience777@gmail.com" rel="noreferrer noreferrer" target="_blank">harshscience777@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">
<div><div><div><span><span>I am calculating the theoretical peak (FP64) performance of the Nvidia DGX A100 system. <br></span></span></div><div><span><span><br></span></span></div><div><span><span>Now, A100 datasheet lists FP64 performance to be 9.7 TFLOPS. <br></span></span></div><div><span><span>Two AMD 7742 CPUs will give 128 cores x 2.25 GHz base clock x 16 FP64 ops / cycle = 4.6 TFLOPS. <br></span></span></div><div><span><span>This gives a total of 82.2 TFLOPS per DGX-A100.</span></span></div><div><span><span><br></span></span></div></div><div><div><span><span>Here is my problem. For any system with DGX A100 on <a href="http://top500.org" rel="noreferrer noreferrer" target="_blank">top500.org</a>, numbers just don't add up. For eg: Selene has 560 DGX boxes, but its theoretical peak is listed as 79.2 PFLOPS, whereas I expect it should be 46 PFLOPS (ie 82.2 TFLOPS x560). The same is true for any other DGX based system listed on top500. What am I missing here?</span></span></div><div><span><span><br></span></span></div><div><span><span>Thanks!</span></span></div><div><span><span><br></span></span></div><div><span><span>Harsh Hemani<br></span></span></div></div></div>
</div>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" rel="noreferrer noreferrer" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer noreferrer noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>