[Beowulf] General cluster management tools - Re: Southampton engineers a Raspberry Pi Supercomputer

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Thu Sep 13 14:32:09 PDT 2012

Comments interspersed (and Gosh, I wish outlook dealt with threading better)

Jim Lux

-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Prentice Bisbal
Sent: Thursday, September 13, 2012 7:14 AM
To: beowulf at beowulf.org
Subject: Re: [Beowulf] General cluster management tools - Re: Southampton engineers a Raspberry Pi Supercomputer

On 09/12/2012 07:52 PM, Mark Hahn wrote:
> for the record, setting up ldap is trivial.  actually, configuring a 
> whole cluster with stateless nodes is pretty straight checklist...
Yes and no. It's easy to you and me because we're professional system administrators who have been doing this for years. However, we talking about a class on building clusters that's for students, may have little or know system administration experience. Setting up a stateless cluster is more difficult than setting up a stateful cluster, there are more issues to worry about (DHCP, network booting, etc.)

>>> Exactly.. You want people to get "beyond the cookbook" to appreciate the sorts of issues that crop up, and to know that they can get help, but how to ask for help intelligently. Teach fishing, etc.

>> I'd really like to know what challenges people are facing in this area.
>> Specific pain points.
> funding.  vendor lockin/licensing.
> lack of design standard for water cooling.
> 10G switches that freeze under load.
> installing and running clusters is easy.  it's the other stuff that's hard.
I have to agree here. For an experienced system admin, building and running a basic cluster isn't too hard, but the devil is in the details. 
My biggest problems have always been people and politics. Some examples:

- Management who doesn't understand clusters, or takes the vendors recommendations over the in-house expert(s)

>>> That is because "in-house experts" need marketing training: cluster admin is easy; marketing is hard, because at a very fundamental level, it's not procedural and rules based.  It's a (sadly much maligned) "soft skill".  This is what separates an effective cluster manager from a cluster administrator, by  the way.

- Vendors who try to sell you what they have, instead of what you need ("Infiniband really isn't any better ethernet", or "You don't need a parallel filesystem. Our network attached storage device has plenty of power performance")
- Getting others to understand the importance of adequate power and cooling in the data center. A cluster is useless if you have to shut it down periodically because the datacenter is overheating.

>>> Marketing skills again.  17 pages of thermal analysis with models and such isn't what you need.  What you need is 1 or 2 slides that succinctly explain the problem and solution, in the context of the "other person".

- Explaining to users that they can't run commercial software package X on the cluster because there's no volume discount and vendor charges too much per node or per instance buy enough licenses. Ohhh.. and their department refused to contribute to the cluster budget.
- And then there's the difficult users...


More information about the Beowulf mailing list