[Beowulf] interconnect wars... again...

John Hearns hearnsj at gmail.com
Fri Jul 28 05:38:35 UTC 2023


Andrew, the answer is very much yes. I guess you are looking at the
interface of 'traditional' HPC which uses workload schedulers and
Kubernetes style clusters which use containers.
Firstly I would ask if you are coming from the point of view of someone who
wants to build a cluster in your home or company using kit which you
already have.
Or are you a company which wants to set up an AI infrastructure?

By the way, I think you are thinking on a CPU cluster and scaling out using
Beowulf concepts.
In that case you are looking at Horovod https://github.com/horovod/horovod
One thing though - for AI applications it is common to deploy Beowulf
clusters which have servers with GPUs as part of their specification.


I think it will be clear to you soon that you will be overwhelmed with
options and opinions.
Firstly join the hpc.social community and introduce yourself on the Slack
channel introductions
I would start with the following resources:

https://www.clustermonkey.net/
https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/
https://catalog.ngc.nvidia.com/containers
https://openhpc.community/
https://ciq.com/
https://qlustar.com/
https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf
https://omnia-doc.readthedocs.io/en/latest/index.html

Does anyone know if the Bright Easy8 licenses are available? I would say
that building  test cluster with Easy 8 would be the quickest way to get
some hands on experience.

You should of course consider cloud providers:
https://aws.amazon.com/hpc/parallelcluster/
https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro
https://cloud.google.com/solutions/hpc
https://go.oracle.com/LP=134426







On Fri, 28 Jul 2023 at 01:10, Andrew Falgout <andrew.falgout at gmail.com>
wrote:

> So I'm interested to see if a Beowulf Cluster could be used for Machine
> Learning, LLM training, and LLM inference.  Anyone know where a good entry
> point is for learning Beowulf Clustering?
>
>
> ./Andrew Falgout
> KG5GRX
>
>
> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico <mdidomenico4 at gmail.com>
> wrote:
>
>> just a mailing list as far as i know.  it used to get a lot more
>> traffic, but seems to have simmered down quite a bit
>>
>> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout <andrew.falgout at gmail.com>
>> wrote:
>> >
>> > Just curious, do we have a discord channel, or just a mailing list?
>> >
>> >
>> > ./Andrew Falgout
>> > KG5GRX
>> >
>> >
>> >
>> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico <
>> mdidomenico4 at gmail.com> wrote:
>> >>
>> >> ugh, as someone who worked the front lines in the 00's i got front row
>> >> seat to the interconnect mud slinging...  but franky if they're going
>> >> to come out of the gate with a product named "Ultra Ethernet", i smell
>> >> a loser... :) (sarcasm...)
>> >>
>> >>
>> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/
>> >> _______________________________________________
>> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> >> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20230728/dd15aa7e/attachment-0001.htm>


More information about the Beowulf mailing list