[Beowulf] 2 starting questions on how I should proceed for a correct first micro-cluster (2-nodes) building

Marco Ippolito ippolito.marco at gmail.com
Sun Mar 3 11:32:36 PST 2019


Thank you very much Greg,Douglas, John and Michael.
You very kindly "overwhelmed" me, and I thank you for that, with hints of
things I didn't know about. So my very next step will be to understand each
of your hints, and I guess I will come back with some more practical
questions about them.
In the meanwhile, just to clarify something of my project:
- for the hard-computing part I'm using C++ and for the web-server side
golang
- to speed up the hardest computing part after trying other tools, I came
up to the idea that HPX, despite being quite complicated, can be of help
- I would like to use Kafka as a distributed message-broker between the
golang's web-server and the C++'s computing parts.
  And being essentially a distributed fault-tolerant append-only log, it
can help in keeping the parts tuned "on the same music"
- I've been using Ubuntu as OS and, if it's possible, I would like to keep
using it also in the distributed environment.

Marco

Il giorno dom 3 mar 2019 alle ore 18:10 Greg Keller <gregwkeller at gmail.com>
ha scritto:

> I Third OpenHPC, or at least the Warewulf underpinnings in it.
> http://warewulf.lbl.gov/
>
> For "learning" the software stack you may consider beefing up your current
> node and running virtualized environment inside it?  I use the community
> version of Proxmox (https://www.proxmox.com/en/downloads). On Ubuntu
> Virt-Manager+QEMU+KVM is equally capable but a bit less obvious for
> configuring VMS & Containers.  Running 3 nodes, each with 8GB RAM and
> leaving 8GB for the host should be sufficient to get the software setup and
> test the basic adminish stuff and strategy.
>
> The key things for a real cluster IMHO are:
> 1) SSH Configuration - ssh keys for passwordless access to all compute
> 2) a shared filesystem - NFS, Lustre, or for Virtual machines on severe
> budget Plan-9 (https://en.wikipedia.org/wiki/9P_(protocol)).  Maybe put
> this NFS and a couple old disks an old Atom based machine you've been
> holding the door open with.
> 3) A capable scheduler, slurm being a current favorite but several tried
> and true options that may be better for your specific project
> 4) Systems management.  Ram Based Filesystems like Warewulf supports are
> great because a reboot ensures that any bit-rot on a "node" is fixed....
> especially if you format the local "scratch" hard disk on boot :).  I see a
> lot of ansible and other methods that seem popular but above my pea brain
> or budget.
> 5) parallel shells.  I used PDSH a lot but several attempts have been made
> over the years. You almost can't have too may ways to run in parallel.
> 6) remote power control and consoles - IPMI/BMC or equivalent is a must
> have when you scale up, but for the starter kit it would be good to have
> too.  Even some really low end Stuff has them these days and it's a feature
> you'll quickly consider essential.  For a COTS cluster without the built in
> BMC, this looks promising.... https://github.com/Fmstrat/diy-ipmi
>
> Not really required, but I mention my good friends Screen and Byobu that
> have saved my bacon many times when an unexpected disconnect (power /
> network etc) of my client would have ravaged a system into an unknown state.
>
> Bonus points for folks who manage & Monitor the cluster.  When something's
> broke does the system tell you before the users?  If yes, you have the
> "Right Stuff" being monitored.
>
> For me the notion of clusters not being heterogeneous is overstated.
> Assuming you compile on a given node (A Master or Login node or shell to a
> compute node with a dev environment installed) at a minimum you want the
> code to run on the other nodes.  Similar generations of processors makes
> this pretty likely.  Identical makes it simple but probably not worth the
> cost on an experiment/learning environment unless you plan to benchmark
> results.  Setting up queues of nodes that are identical so that a code runs
> efficiently on a given subset of nodes is a fair compromise.  None of this
> matters in the Virtual Machine environment if you decide to start there.
>
> And everything Doug just said... :)
>
> On Sun, Mar 3, 2019 at 3:25 AM John Hearns via Beowulf <
> beowulf at beowulf.org> wrote:
>
>> I second OpenHPC. It is actively maintained and easy to set up.
>>
>> Regarding the hardware, have a look at Doug Eadlines Limulus clusters. I
>> think they would be a good fit.
>> Dougs site is excellent in general https://www.clustermonkey.net/
>>
>> Also some people build Raspberry Pi clusters for learning.
>>
>>
>> On Sun, 3 Mar 2019 at 01:16, Renfro, Michael <Renfro at tntech.edu> wrote:
>>
>>> Heterogeneous is possible, but the slower system will be a bottleneck if
>>> you have calculations that require both systems to work in parallel and
>>> synchronize with each other periodically. You might also find bottlenecks
>>> with your network interconnect, even on homogeneous systems.
>>>
>>> I’ve never used ROCKS, and OSCAR doesn’t look to have been updated in a
>>> few years (maybe it doesn’t need to be). OpenHPC is a similar product, more
>>> recently updated. But except for the cluster I manage now, I always just
>>> just went with a base operating system for the nodes and added HPC
>>> libraries and services as required.
>>>
>>> > On Mar 2, 2019, at 7:34 AM, Marco Ippolito <ippolito.marco at gmail.com>
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I'm developing an application which need to use tools and other
>>> applications that excel in a distributed environment:
>>> > - HPX ( https://github.com/STEllAR-GROUP/hpx ) ,
>>> > - Kafka ( http://kafka.apache.org/ )
>>> > - a blockchain tool.
>>> > This is why I'm eager to learn how to deploy a beowulf cluster.
>>> >
>>> > I've read some info here:
>>> > - https://en.wikibooks.org/wiki/Building_a_Beowulf_Cluster
>>> > - https://www.linux.com/blog/building-beowulf-cluster-just-13-steps
>>> > -
>>> https://www-users.cs.york.ac.uk/~mjf/pi_cluster/src/Building_a_simple_Beowulf_cluster.html
>>> >
>>> > And I have 2 starting questions in order to clarify how I should
>>> proceed for a correct cluster building:
>>> >
>>> > 1) My starting point is a PC, I'm working with at the moment, with
>>> this features:
>>> >   - Corsair Simm Memoria RAM, DDR3, PC1600, 32GB, CL10 Ven k
>>> >   - Intel Ci7 Box Processore CPU 1150 i7-4790K, 4.00 GHz
>>> >   - Samsung MZ-76E500B Unità SSD Interna 860 EVO, 500 GB, 2.5" SATA
>>> III, Nero/Grigio
>>> >   - MB ASUS H97-PLUS
>>> >    - lettore DVD-RW
>>> >
>>> >   I'm using as OS Ubuntu 18.04.01 Server Edition.
>>> >
>>> > On one side I read that it should be better to put in the same cluster
>>> the same type of HW : PCs of the same type,
>>> > but on the other side also hetherogeneous HW (server or PCs) can also
>>> be deployed.
>>> > So....which HW should I take in consideration for the second node, if
>>> the features of the very first "node" are the ones above?
>>> >
>>> > 2) I read that some software (Rocks, OSCAR) would make the cluster
>>> configuration easier and smoother. But I also read that
>>> >  using the same OS,
>>> > with the right same version, for all nodes, in my case Ubuntu 18.04.01
>>> Server Edition, could be a safe starter.
>>> > So... is it strictly necessary to use Rocks or OSCAR to correctly
>>> configure the nodes network?
>>> >
>>> > Looking forward to your kind hints and suggestions.
>>> > Marco
>>> >
>>> >
>>> > _______________________________________________
>>> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing
>>> > To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20190303/11cde90f/attachment.html>


More information about the Beowulf mailing list