[Beowulf] interconnect wars... again...

John Hearns hearnsj at gmail.com
Mon Jul 31 19:35:25 UTC 2023


Please keep the list updated on what you find,

On Mon, 31 Jul 2023 at 20:08, Andrew Falgout <andrew.falgout at gmail.com>
wrote:

> Not ignoring you guys, literally have been moving.  We had to downsize,
> I've got no power and even my main computer is still powered off.  There's
> nothing more eerie than a quiet computer room.  I'm on an old laptop that I
> threw Linux Mint 21 on to be here now.  Okay.. to introduce myself a bit
> more.
> I've been doing linux for a long time, but been in a silo for a long
> time.  I feel like I've not used so many skills that I can't trust them
> anymore.  So I'm mentally just marking my cache as dirty and going to
> relearn as much as I can.  Great information so far.
>
> I have hardware and storage space to play with.  (Dell R930 112 core/600gb
> of ram)  The issue is getting a graphics card in them for compute is really
> not proving to be ideal.  I have about 4 of these machines, and I'd like to
> play around with clustering.  Learning how to properly and securely plan
> and implement them.  I've played around with docker, and have used multiple
> docker servers with portainer.  Next, when I get an electrician to install
> power, is to try to setup a kubernetes cluster.
> When I can get something with some decent compute (not this laptop), I'd
> like to learn how to train a small llm model using the cluster if
> possible.  I know I can do a good bit slowly with the CPU.  If I can get a
> GPU in the mix, doing that to speed things up.
> Again.. I would like to apologize for being quiet for so long.  I'll try
> to toss an "ack" in there from my phone if nothing else.
>
>
> ./Andrew Falgout
> KG5GRX
>
>
> On Mon, Jul 31, 2023 at 6:10 AM John Hearns <hearnsj at gmail.com> wrote:
>
>> A quick ack would be nice.
>>
>> On Fri, 28 Jul 2023, 06:38 John Hearns, <hearnsj at gmail.com> wrote:
>>
>>> Andrew, the answer is very much yes. I guess you are looking at the
>>> interface of 'traditional' HPC which uses workload schedulers and
>>> Kubernetes style clusters which use containers.
>>> Firstly I would ask if you are coming from the point of view of someone
>>> who wants to build a cluster in your home or company using kit which you
>>> already have.
>>> Or are you a company which wants to set up an AI infrastructure?
>>>
>>> By the way, I think you are thinking on a CPU cluster and scaling out
>>> using Beowulf concepts.
>>> In that case you are looking at Horovod
>>> https://github.com/horovod/horovod
>>> One thing though - for AI applications it is common to deploy Beowulf
>>> clusters which have servers with GPUs as part of their specification.
>>>
>>>
>>> I think it will be clear to you soon that you will be overwhelmed with
>>> options and opinions.
>>> Firstly join the hpc.social community and introduce yourself on the
>>> Slack channel introductions
>>> I would start with the following resources:
>>>
>>> https://www.clustermonkey.net/
>>> https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/
>>> https://catalog.ngc.nvidia.com/containers
>>> https://openhpc.community/
>>> https://ciq.com/
>>> https://qlustar.com/
>>>
>>> https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf
>>> https://omnia-doc.readthedocs.io/en/latest/index.html
>>>
>>> Does anyone know if the Bright Easy8 licenses are available? I would say
>>> that building  test cluster with Easy 8 would be the quickest way to get
>>> some hands on experience.
>>>
>>> You should of course consider cloud providers:
>>> https://aws.amazon.com/hpc/parallelcluster/
>>>
>>> https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro
>>> https://cloud.google.com/solutions/hpc
>>> https://go.oracle.com/LP=134426
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 28 Jul 2023 at 01:10, Andrew Falgout <andrew.falgout at gmail.com>
>>> wrote:
>>>
>>>> So I'm interested to see if a Beowulf Cluster could be used for Machine
>>>> Learning, LLM training, and LLM inference.  Anyone know where a good entry
>>>> point is for learning Beowulf Clustering?
>>>>
>>>>
>>>> ./Andrew Falgout
>>>> KG5GRX
>>>>
>>>>
>>>> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico <
>>>> mdidomenico4 at gmail.com> wrote:
>>>>
>>>>> just a mailing list as far as i know.  it used to get a lot more
>>>>> traffic, but seems to have simmered down quite a bit
>>>>>
>>>>> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout <
>>>>> andrew.falgout at gmail.com> wrote:
>>>>> >
>>>>> > Just curious, do we have a discord channel, or just a mailing list?
>>>>> >
>>>>> >
>>>>> > ./Andrew Falgout
>>>>> > KG5GRX
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico <
>>>>> mdidomenico4 at gmail.com> wrote:
>>>>> >>
>>>>> >> ugh, as someone who worked the front lines in the 00's i got front
>>>>> row
>>>>> >> seat to the interconnect mud slinging...  but franky if they're
>>>>> going
>>>>> >> to come out of the gate with a product named "Ultra Ethernet", i
>>>>> smell
>>>>> >> a loser... :) (sarcasm...)
>>>>> >>
>>>>> >>
>>>>> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/
>>>>> >> _______________________________________________
>>>>> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>>> Computing
>>>>> >> To change your subscription (digest mode or unsubscribe) visit
>>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>>>
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>> Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20230731/eb44020e/attachment-0001.htm>


More information about the Beowulf mailing list