[Beowulf] Project Planning: Storage, Network, and Redundancy Considerations

John Hearns john.hearns at streamline-computing.com
Mon Mar 19 09:42:44 PDT 2007

Brian R. Smith wrote:
> Hey list,
> 1. Proprietary parallel storage systems (like Panasas, etc.):  It 
> provides the per-node bandwidth, aggregate bandwidth, caching 
> mechanisms, fault-tolerance, and redundancy that we require (plus having 
> a vendor offering 24x7x365 support & 24 hour turnover is quite a breath 
> of fresh air for us).  Price point is a little high for the amount of 
> storage that we will get though, little more than doubling our current 
> overall capacity.  As far as I can tell, I can use this device as a 
> permanent data store (like /home) and also as the user's scratch space 
> so that there is only a single point for all data needs across the 
> cluster.  It does, however, require the installation of vendor kernel 
> modules which do often add overhead to system administration (as they 
> need to be compiled, linked, and tested before every kernel update).

If you like Panasas, go with them.
The kernel module thing isn't all that a big deal - they are quite 
willing to 'cook' the modules for you.
but YMMV

> Our final problem is a relatively simple one though I am definitely a 
> newbie to the H.A. world.  Under this consolidation plan, we will have 
> only one point of entry to this cluster and hence a single point of 
> failure.  Have any beowulfers had experience with deploying clusters 
> with redundant head nodes in a pseudo-H.A. fashion (heartbeat 
> monitoring, fail-over, etc.) and what experiences have you had in
> adapting your resource manager to this task?  Would it simply be more 
> feasible to move the resource manager to another machine at this point 
> (and have both headnodes act as submit and administrative clients)?  My 
> current plan is unfortunately light on the details of handling SGE in 
> such an environment.  It includes purchasing two identical 1U boxes 
> (with good support contracts).  They will monitor each other for 
> availability and the goal is to have the spare take over if the master 
> fails.  While the spare is not in use, I was planning on dispatching 
> jobs to it.

I have constructed several clusters using HA.
I believe Joe Landman has also - as you are in the States why not give 
some thought to contacting Scalable and getting them to do some more 
detailed designs for you?

For HA clusters, I have implemented several clusters using Linux-HA and 
heartbeat. This is an active/passive setup, with a primary and a backup 
head node. On failover, the backup head node starts up cluster services.
Failing over SGE is (relatively) easy - the main part is making sure 
that the cluster spool directory is on shared storage.
And mounting that share storage on one machine or the other :-)

The harder part is failing over NFS - again I've done it.
I gather there is a wrinkle or two with NFS v4 on Linux-HA type systems.

The second way to do this would be to look at using shared storage,
and using the Gridengine queue master failover mechanism. This is a 
different approach, in that you have two machines running, using either 
a NAS type storage server or Panasas/Lustre. The SGE spool directory is 
on this, and the SGE qmaster will start on the second machine if the 
first fails to answer its heartbeat.

ps. 1U boxes? Think something a bit bigger - with hot swap PSUs.
You also might have to fit a second network card for your HA heartbeat 
link (link plural - you need two links) plus a SCSI card, so think 
slightly bigger boxes for the two head nodes.
You can spec 1U nodes for interactive login/compile/job submission 
nodes. Maybe you could run a DNS round robin type load balancer for 
redundancy on these boxes - they should all be similar, and if one stops 
working then ho-hum.

pps. "when the spare is not in use dispatching jobs to it"
Actually, we also do a cold failover setup which is just like that, and 
the backup node is used for running jobs when it is idle.

      John Hearns
      Senior HPC Engineer
      Streamline Computing,
      The Innovation Centre, Warwick Technology Park,
      Gallows Hill, Warwick CV34 6UW
      Office: 01926 623130 Mobile: 07841 231235

More information about the Beowulf mailing list