<div dir="ltr">Andrew, the answer is very much yes. I guess you are looking at the interface of 'traditional' HPC which uses workload schedulers and Kubernetes style clusters which use containers.<div>Firstly I would ask if you are coming from the point of view of someone who wants to build a cluster in your home or company using kit which you already have.</div><div>Or are you a company which wants to set up an AI infrastructure?</div><div><br></div><div>By the way, I think you are thinking on a CPU cluster and scaling out using Beowulf concepts.</div><div>In that case you are looking at Horovod <a href="https://github.com/horovod/horovod">https://github.com/horovod/horovod</a></div><div>One thing though - for AI applications it is common to deploy Beowulf clusters which have servers with GPUs as part of their specification.</div><div><br></div><div><br><div>I think it will be clear to you soon that you will be overwhelmed with options and opinions.</div><div>Firstly join the hpc.social community and introduce yourself on the Slack channel introductions</div><div>I would start with the following resources:</div><div><br></div><div><a href="https://www.clustermonkey.net/">https://www.clustermonkey.net/</a><br></div><div><a href="https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/">https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/</a><br></div><div><a href="https://catalog.ngc.nvidia.com/containers">https://catalog.ngc.nvidia.com/containers</a><br></div><div><a href="https://openhpc.community/">https://openhpc.community/</a><br></div><div><a href="https://ciq.com/">https://ciq.com/</a><br></div><div><a href="https://qlustar.com/">https://qlustar.com/</a><br></div><div><a href="https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf">https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf</a><br></div><div><a href="https://omnia-doc.readthedocs.io/en/latest/index.html">https://omnia-doc.readthedocs.io/en/latest/index.html</a><br></div><div><br></div><div>Does anyone know if the Bright Easy8 licenses are available? I would say that building  test cluster with Easy 8 would be the quickest way to get some hands on experience.</div><div><br></div><div>You should of course consider cloud providers:</div><div><a href="https://aws.amazon.com/hpc/parallelcluster/">https://aws.amazon.com/hpc/parallelcluster/</a><br></div><div><a href="https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro">https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro</a><br></div><div><a href="https://cloud.google.com/solutions/hpc">https://cloud.google.com/solutions/hpc</a><br></div><div><a href="https://go.oracle.com/LP=134426">https://go.oracle.com/LP=134426</a><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 28 Jul 2023 at 01:10, Andrew Falgout <<a href="mailto:andrew.falgout@gmail.com">andrew.falgout@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">So I'm interested to see if a Beowulf Cluster could be used for Machine Learning, LLM training, and LLM inference.  Anyone know where a good entry point is for learning Beowulf Clustering?  <br clear="all"><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><p style="margin:0in 0in 12pt;font-size:11pt;font-family:Calibri,sans-serif"><span style="font-size:11pt"><br>./Andrew Falgout<br></span>KG5GRX</p></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico <<a href="mailto:mdidomenico4@gmail.com" target="_blank">mdidomenico4@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">just a mailing list as far as i know.  it used to get a lot more<br>
traffic, but seems to have simmered down quite a bit<br>
<br>
On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout <<a href="mailto:andrew.falgout@gmail.com" target="_blank">andrew.falgout@gmail.com</a>> wrote:<br>
><br>
> Just curious, do we have a discord channel, or just a mailing list?<br>
><br>
><br>
> ./Andrew Falgout<br>
> KG5GRX<br>
><br>
><br>
><br>
> On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico <<a href="mailto:mdidomenico4@gmail.com" target="_blank">mdidomenico4@gmail.com</a>> wrote:<br>
>><br>
>> ugh, as someone who worked the front lines in the 00's i got front row<br>
>> seat to the interconnect mud slinging...  but franky if they're going<br>
>> to come out of the gate with a product named "Ultra Ethernet", i smell<br>
>> a loser... :) (sarcasm...)<br>
>><br>
>> <a href="https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/" rel="noreferrer" target="_blank">https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/</a><br>
>> _______________________________________________<br>
>> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
>> To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote></div>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote></div>