[Beowulf] 2 starting questions on how I should proceed for a correct first micro-cluster (2-nodes) building

Greg Keller gregwkeller at gmail.com
Sun Mar 3 09:08:32 PST 2019

I Third OpenHPC, or at least the Warewulf underpinnings in it.

For "learning" the software stack you may consider beefing up your current
node and running virtualized environment inside it?  I use the community
version of Proxmox (https://www.proxmox.com/en/downloads). On Ubuntu
Virt-Manager+QEMU+KVM is equally capable but a bit less obvious for
configuring VMS & Containers.  Running 3 nodes, each with 8GB RAM and
leaving 8GB for the host should be sufficient to get the software setup and
test the basic adminish stuff and strategy.

The key things for a real cluster IMHO are:
1) SSH Configuration - ssh keys for passwordless access to all compute
2) a shared filesystem - NFS, Lustre, or for Virtual machines on severe
budget Plan-9 (https://en.wikipedia.org/wiki/9P_(protocol)).  Maybe put
this NFS and a couple old disks an old Atom based machine you've been
holding the door open with.
3) A capable scheduler, slurm being a current favorite but several tried
and true options that may be better for your specific project
4) Systems management.  Ram Based Filesystems like Warewulf supports are
great because a reboot ensures that any bit-rot on a "node" is fixed....
especially if you format the local "scratch" hard disk on boot :).  I see a
lot of ansible and other methods that seem popular but above my pea brain
or budget.
5) parallel shells.  I used PDSH a lot but several attempts have been made
over the years. You almost can't have too may ways to run in parallel.
6) remote power control and consoles - IPMI/BMC or equivalent is a must
have when you scale up, but for the starter kit it would be good to have
too.  Even some really low end Stuff has them these days and it's a feature
you'll quickly consider essential.  For a COTS cluster without the built in
BMC, this looks promising.... https://github.com/Fmstrat/diy-ipmi

Not really required, but I mention my good friends Screen and Byobu that
have saved my bacon many times when an unexpected disconnect (power /
network etc) of my client would have ravaged a system into an unknown state.

Bonus points for folks who manage & Monitor the cluster.  When something's
broke does the system tell you before the users?  If yes, you have the
"Right Stuff" being monitored.

For me the notion of clusters not being heterogeneous is overstated.
Assuming you compile on a given node (A Master or Login node or shell to a
compute node with a dev environment installed) at a minimum you want the
code to run on the other nodes.  Similar generations of processors makes
this pretty likely.  Identical makes it simple but probably not worth the
cost on an experiment/learning environment unless you plan to benchmark
results.  Setting up queues of nodes that are identical so that a code runs
efficiently on a given subset of nodes is a fair compromise.  None of this
matters in the Virtual Machine environment if you decide to start there.

And everything Doug just said... :)

On Sun, Mar 3, 2019 at 3:25 AM John Hearns via Beowulf <beowulf at beowulf.org>

> I second OpenHPC. It is actively maintained and easy to set up.
> Regarding the hardware, have a look at Doug Eadlines Limulus clusters. I
> think they would be a good fit.
> Dougs site is excellent in general https://www.clustermonkey.net/
> Also some people build Raspberry Pi clusters for learning.
> On Sun, 3 Mar 2019 at 01:16, Renfro, Michael <Renfro at tntech.edu> wrote:
>> Heterogeneous is possible, but the slower system will be a bottleneck if
>> you have calculations that require both systems to work in parallel and
>> synchronize with each other periodically. You might also find bottlenecks
>> with your network interconnect, even on homogeneous systems.
>> I’ve never used ROCKS, and OSCAR doesn’t look to have been updated in a
>> few years (maybe it doesn’t need to be). OpenHPC is a similar product, more
>> recently updated. But except for the cluster I manage now, I always just
>> just went with a base operating system for the nodes and added HPC
>> libraries and services as required.
>> > On Mar 2, 2019, at 7:34 AM, Marco Ippolito <ippolito.marco at gmail.com>
>> wrote:
>> >
>> > Hi all,
>> >
>> > I'm developing an application which need to use tools and other
>> applications that excel in a distributed environment:
>> > - HPX ( https://github.com/STEllAR-GROUP/hpx ) ,
>> > - Kafka ( http://kafka.apache.org/ )
>> > - a blockchain tool.
>> > This is why I'm eager to learn how to deploy a beowulf cluster.
>> >
>> > I've read some info here:
>> > - https://en.wikibooks.org/wiki/Building_a_Beowulf_Cluster
>> > - https://www.linux.com/blog/building-beowulf-cluster-just-13-steps
>> > -
>> https://www-users.cs.york.ac.uk/~mjf/pi_cluster/src/Building_a_simple_Beowulf_cluster.html
>> >
>> > And I have 2 starting questions in order to clarify how I should
>> proceed for a correct cluster building:
>> >
>> > 1) My starting point is a PC, I'm working with at the moment, with this
>> features:
>> >   - Corsair Simm Memoria RAM, DDR3, PC1600, 32GB, CL10 Ven k
>> >   - Intel Ci7 Box Processore CPU 1150 i7-4790K, 4.00 GHz
>> >   - Samsung MZ-76E500B Unità SSD Interna 860 EVO, 500 GB, 2.5" SATA
>> III, Nero/Grigio
>> >   - MB ASUS H97-PLUS
>> >    - lettore DVD-RW
>> >
>> >   I'm using as OS Ubuntu 18.04.01 Server Edition.
>> >
>> > On one side I read that it should be better to put in the same cluster
>> the same type of HW : PCs of the same type,
>> > but on the other side also hetherogeneous HW (server or PCs) can also
>> be deployed.
>> > So....which HW should I take in consideration for the second node, if
>> the features of the very first "node" are the ones above?
>> >
>> > 2) I read that some software (Rocks, OSCAR) would make the cluster
>> configuration easier and smoother. But I also read that
>> >  using the same OS,
>> > with the right same version, for all nodes, in my case Ubuntu 18.04.01
>> Server Edition, could be a safe starter.
>> > So... is it strictly necessary to use Rocks or OSCAR to correctly
>> configure the nodes network?
>> >
>> > Looking forward to your kind hints and suggestions.
>> > Marco
>> >
>> >
>> > _______________________________________________
>> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> > To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20190303/57036023/attachment-0001.html>

More information about the Beowulf mailing list