<div dir="ltr">Okay, that is the same slot Summit/Sierra use for the EDR HCA. You may want to check out our paper at SC19 where we look at several new features in EDR as well as how to best stripe data over the four virtual ports.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 10, 2019 at 1:49 PM Bill Wichser <<a href="mailto:bill@princeton.edu">bill@princeton.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Actually 12 per rack. The reasoning was that there were 2 connections <br>
per host to top of rack switch leaving 12 uplinkks to two tier0 switches <br>
at 6 each.<br>
<br>
For the IB cards they are some special flavored Mellanox which attach to <br>
the PCIv4 sockets, 8 lanes each. And since 8 lanes of v4 == 16 lanes of <br>
v3, we get full EDR to both CPU sockets.<br>
<br>
Bill<br>
<br>
On 10/10/19 12:57 PM, Scott Atchley wrote:<br>
> That is better than 80% peak, nice.<br>
> <br>
> Is it three racks of 15 nodes? Or two racks of 18 and 9 in the third rack?<br>
> <br>
> You went with a single-port HCA per socket and not the shared, dual-port <br>
> HCA in the shared PCIe slot?<br>
> <br>
> On Thu, Oct 10, 2019 at 8:48 AM Bill Wichser <<a href="mailto:bill@princeton.edu" target="_blank">bill@princeton.edu</a> <br>
> <mailto:<a href="mailto:bill@princeton.edu" target="_blank">bill@princeton.edu</a>>> wrote:<br>
> <br>
> Thanks for the kind words. Yes, we installed more like a mini-Sierra<br>
> machine which is air cooled. There are 46 nodes of the IBM AC922, two<br>
> socket, 4 V100 where each socket uses the SMT threading x4. So two 16<br>
> core chips, 32/node, 128 threads per node. The GPUs all use NVLink.<br>
> <br>
> There are two EDR connections per host, each tied to a CPU, 1:1 per<br>
> rack<br>
> of 12 and 2:1 between racks. We have a 2P scratch filesystem running<br>
> GPFS. Each node also has a 3T NVMe card as well for local scratch.<br>
> <br>
> And we're running Slurm as our scheduler.<br>
> <br>
> We'll see if it makes the top500 in November. It fits there today but<br>
> who knows what else got on there since June. With the help of<br>
> nVidia we<br>
> managed to get 1.09PF across 45 nodes.<br>
> <br>
> Bill<br>
> <br>
> On 10/10/19 7:45 AM, Michael Di Domenico wrote:<br>
> > for those that may not have seen<br>
> ><br>
> ><br>
> <a href="https://insidehpc.com/2019/10/traverse-supercomputer-to-accelerate-fusion-research-at-princeton/" rel="noreferrer" target="_blank">https://insidehpc.com/2019/10/traverse-supercomputer-to-accelerate-fusion-research-at-princeton/</a><br>
> ><br>
> > Bill Wischer and Prentice Bisbal are frequent contributors to the<br>
> > list, Congrats on the acquisition. Its nice to see more HPC<br>
> expansion<br>
> > in our otherwise barren hometown... :)<br>
> ><br>
> > Maybe one of them will pass along some detail on the machine...<br>
> > _______________________________________________<br>
> > Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a><br>
> <mailto:<a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a>> sponsored by Penguin Computing<br>
> > To change your subscription (digest mode or unsubscribe) visit<br>
> <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
> ><br>
> _______________________________________________<br>
> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a><br>
> <mailto:<a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a>> sponsored by Penguin Computing<br>
> To change your subscription (digest mode or unsubscribe) visit<br>
> <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
> <br>
</blockquote></div>