[Beowulf] interconnect wars... again...

Andrew Falgout andrew.falgout at gmail.com
Mon Jul 31 19:07:49 UTC 2023


Not ignoring you guys, literally have been moving.  We had to downsize,
I've got no power and even my main computer is still powered off.  There's
nothing more eerie than a quiet computer room.  I'm on an old laptop that I
threw Linux Mint 21 on to be here now.  Okay.. to introduce myself a bit
more.
I've been doing linux for a long time, but been in a silo for a long time.
I feel like I've not used so many skills that I can't trust them anymore.
So I'm mentally just marking my cache as dirty and going to relearn as much
as I can.  Great information so far.

I have hardware and storage space to play with.  (Dell R930 112 core/600gb
of ram)  The issue is getting a graphics card in them for compute is really
not proving to be ideal.  I have about 4 of these machines, and I'd like to
play around with clustering.  Learning how to properly and securely plan
and implement them.  I've played around with docker, and have used multiple
docker servers with portainer.  Next, when I get an electrician to install
power, is to try to setup a kubernetes cluster.
When I can get something with some decent compute (not this laptop), I'd
like to learn how to train a small llm model using the cluster if
possible.  I know I can do a good bit slowly with the CPU.  If I can get a
GPU in the mix, doing that to speed things up.
Again.. I would like to apologize for being quiet for so long.  I'll try to
toss an "ack" in there from my phone if nothing else.


./Andrew Falgout
KG5GRX


On Mon, Jul 31, 2023 at 6:10 AM John Hearns <hearnsj at gmail.com> wrote:

> A quick ack would be nice.
>
> On Fri, 28 Jul 2023, 06:38 John Hearns, <hearnsj at gmail.com> wrote:
>
>> Andrew, the answer is very much yes. I guess you are looking at the
>> interface of 'traditional' HPC which uses workload schedulers and
>> Kubernetes style clusters which use containers.
>> Firstly I would ask if you are coming from the point of view of someone
>> who wants to build a cluster in your home or company using kit which you
>> already have.
>> Or are you a company which wants to set up an AI infrastructure?
>>
>> By the way, I think you are thinking on a CPU cluster and scaling out
>> using Beowulf concepts.
>> In that case you are looking at Horovod
>> https://github.com/horovod/horovod
>> One thing though - for AI applications it is common to deploy Beowulf
>> clusters which have servers with GPUs as part of their specification.
>>
>>
>> I think it will be clear to you soon that you will be overwhelmed with
>> options and opinions.
>> Firstly join the hpc.social community and introduce yourself on the Slack
>> channel introductions
>> I would start with the following resources:
>>
>> https://www.clustermonkey.net/
>> https://www.nvidia.com/en-gb/data-center/bright-cluster-manager/
>> https://catalog.ngc.nvidia.com/containers
>> https://openhpc.community/
>> https://ciq.com/
>> https://qlustar.com/
>>
>> https://www.delltechnologies.com/asset/en-nz/products/ready-solutions/technical-support/omnia-solution-overview.pdf
>> https://omnia-doc.readthedocs.io/en/latest/index.html
>>
>> Does anyone know if the Bright Easy8 licenses are available? I would say
>> that building  test cluster with Easy 8 would be the quickest way to get
>> some hands on experience.
>>
>> You should of course consider cloud providers:
>> https://aws.amazon.com/hpc/parallelcluster/
>>
>> https://azure.microsoft.com/en-gb/solutions/high-performance-computing/#intro
>> https://cloud.google.com/solutions/hpc
>> https://go.oracle.com/LP=134426
>>
>>
>>
>>
>>
>>
>>
>> On Fri, 28 Jul 2023 at 01:10, Andrew Falgout <andrew.falgout at gmail.com>
>> wrote:
>>
>>> So I'm interested to see if a Beowulf Cluster could be used for Machine
>>> Learning, LLM training, and LLM inference.  Anyone know where a good entry
>>> point is for learning Beowulf Clustering?
>>>
>>>
>>> ./Andrew Falgout
>>> KG5GRX
>>>
>>>
>>> On Wed, Jul 26, 2023 at 8:39 AM Michael DiDomenico <
>>> mdidomenico4 at gmail.com> wrote:
>>>
>>>> just a mailing list as far as i know.  it used to get a lot more
>>>> traffic, but seems to have simmered down quite a bit
>>>>
>>>> On Tue, Jul 25, 2023 at 6:50 PM Andrew Falgout <
>>>> andrew.falgout at gmail.com> wrote:
>>>> >
>>>> > Just curious, do we have a discord channel, or just a mailing list?
>>>> >
>>>> >
>>>> > ./Andrew Falgout
>>>> > KG5GRX
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico <
>>>> mdidomenico4 at gmail.com> wrote:
>>>> >>
>>>> >> ugh, as someone who worked the front lines in the 00's i got front
>>>> row
>>>> >> seat to the interconnect mud slinging...  but franky if they're going
>>>> >> to come out of the gate with a product named "Ultra Ethernet", i
>>>> smell
>>>> >> a loser... :) (sarcasm...)
>>>> >>
>>>> >>
>>>> https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/
>>>> >> _______________________________________________
>>>> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>> Computing
>>>> >> To change your subscription (digest mode or unsubscribe) visit
>>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20230731/7a990e5b/attachment.htm>


More information about the Beowulf mailing list