[Beowulf] First cluster in 20 years - questions about today

Renfro, Michael Renfro at tntech.edu
Sun Feb 2 10:23:31 PST 2020

I don’t see anything wrong in Jonathan’s advice. I’d also add that for computational chemistry, depending on model size and CPU, a gigabit network could be a bottleneck if you use a multi-node solver. I’ve seen bottlenecking on small models even with FDR Infiniband. GPUs could also be useful for compatible solvers.


(From https://its.tntech.edu/display/MON/HPC+Sample+Job%3A+NAMD)

So heterogeneity may not be a problem if you stick with large-core OpenMP jobs. A scheduler like Slurm could be useful if you want to stack up a bunch of jobs in advance. Cluster management software might not be strictly required for a few-node single-user cluster, but something like OpenHPC isn’t too hard to get running.

If you were going to keep the old (2-core?) nodes around, I’d probably turn them into a storage cluster (Ceph, Gluster, etc). No idea if their power draw is too high to be worthwhile, though.

Mike Renfro, PhD  / HPC Systems Administrator, Information Technology Services
931 372-3601<tel:931%20372-3601>      / Tennessee Tech University

On Feb 1, 2020, at 9:21 PM, Mark Kosmowski <mark.kosmowski at gmail.com> wrote:

I've been out of computation for about 20 years since my master degree.  I'm getting into the game again as a private individual.  When I was active Opteron was just launched - I was an early adopter of amd64 because I needed the RAM (maybe more accurately I needed to thoroughly thrash my swap drives).  I never needed any cluster management software with my 3 node, dual socket, single core little baby Beowulf.  (My planned domain is computational chemistry and I'm hoping to get to a point where I can do ab initio catalyst surface reaction modeling of small molecules (not biomolecules).)

I'm planning to add a few nodes and it will end up being fairly heterogenous.  My initial plan is to add two or three multi-socket, multi-core nodes as well as a 48 port gigabit switch.  How should I assess whether to have one big heterogenous cluster vs. two smaller quasi-homogenous clusters?

Will it be worthwhile to learn a cluster management software?  If so, suggestions?

Should I consider Solaris or illumos?  I do plan on using ZFS, especially for the data node, but I want as much redundancy as I can get, since I'm going to be using used hardware.  Will the fancy Solaris cluster tools be useful?

Also, once I get running, while I'm getting current with theory and software may I inquire here about taking on a small, low priority academic project to make sure the cluster side is working good?

Thank you all for still being here!
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200202/9b1c5e8e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ibverbs-speedup.png
Type: image/png
Size: 66543 bytes
Desc: ibverbs-speedup.png
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200202/9b1c5e8e/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3M.png
Type: image/png
Size: 84068 bytes
Desc: 3M.png
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200202/9b1c5e8e/attachment-0003.png>

More information about the Beowulf mailing list