[Beowulf] interconnect wars... again...

John Hearns hearnsj at gmail.com
Mon Jul 31 11:10:20 UTC 2023


A quick ack would be nice.

On Fri, 28 Jul 2023, 06:38 John Hearns, <hearnsj at gmail.com> wrote:

> Andrew, the answer is very much yes. I guess you are looking at the
> interface of 'traditional' HPC which uses workload schedulers and
> Kubernetes style clusters which use containers.
> Firstly I would ask if you are coming from the point of view of someone
> who wants to build a cluster in your home or company using kit which you
> already have.
> Or are you a company which wants to set up an AI infrastructure?
>
> By the way, I think you are thinking on a CPU cluster and scaling out
> using Beowulf concepts.
> In that case you are looking at Horovod https://github.com/horovod/horovod
> One thing though - for AI applications it is common to deploy Beowulf
> clusters which have servers with GPUs as part of their specification.
>
>
> I think it will be clear to you soon that you will be overwhelmed with
> options and opinions.
> Firstly join the hpc.social community and introduce yourself on the Slack
> channel introductions
> I would start with the following resources:
>
> https://www.clustermonkey.net/
> https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/
> https://catalog.ngc.nvidia.com/containers
> https://openhpc.community/
> https://ciq.com/
> https://qlustar.com/
>
> https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf
> https://omnia-doc.readthedocs.io/en/latest/index.html
>
> Does anyone know if the Bright Easy8 licenses are available? I would say
> that building  test cluster with Easy 8 would be the quickest way to get
> some hands on experience.
>
> You should of course consider cloud providers:
> https://aws.amazon.com/hpc/parallelcluster/
>
> https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro
> https://cloud.google.com/solutions/hpc
> https://go.oracle.com/LP=134426
>
>
>
>
>
>
>
> On Fri, 28 Jul 2023 at 01:10, Andrew Falgout <andrew.falgout at gmail.com>
> wrote:
>
>> So I'm interested to see if a Beowulf Cluster could be used for Machine
>> Learning, LLM training, and LLM inference.  Anyone know where a good entry
>> point is for learning Beowulf Clustering?
>>
>>
>> ./Andrew Falgout
>> KG5GRX
>>
>>
>> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico <
>> mdidomenico4 at gmail.com> wrote:
>>
>>> just a mailing list as far as i know.  it used to get a lot more
>>> traffic, but seems to have simmered down quite a bit
>>>
>>> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout <andrew.falgout at gmail.com>
>>> wrote:
>>> >
>>> > Just curious, do we have a discord channel, or just a mailing list?
>>> >
>>> >
>>> > ./Andrew Falgout
>>> > KG5GRX
>>> >
>>> >
>>> >
>>> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico <
>>> mdidomenico4 at gmail.com> wrote:
>>> >>
>>> >> ugh, as someone who worked the front lines in the 00's i got front row
>>> >> seat to the interconnect mud slinging...  but franky if they're going
>>> >> to come out of the gate with a product named "Ultra Ethernet", i smell
>>> >> a loser... :) (sarcasm...)
>>> >>
>>> >>
>>> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/
>>> >> _______________________________________________
>>> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing
>>> >> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20230731/d018f2d9/attachment.htm>


More information about the Beowulf mailing list