[Beowulf] New member, upgrading our existing Beowulf cluster

Joshua Baker-LePain jlb17 at duke.edu
Thu Dec 3 11:35:45 PST 2009

On Thu, 3 Dec 2009 at 2:29pm, Mark Hahn wrote

>>> if a single node goes down, you need to take down all the
>>> nodes in the chassis before you can remove the dead node. Not very
>>> practical.
>> Eh? What's so hard about marking the other nodes as unusable in your
>> batch system, and waiting for them to become free?
> depends on your max job length.  but yeah, idling three nodes for a week
> is not going to be noticable in anything but a quite small cluster...

But doesn't the engineer in you just bristle at the (admittedly, rather 
slight) inefficiency?  Call me OCD (you wouldn't be the first), but it 
just bugs me.

Joshua Baker-LePain
QB3 Shared Cluster Sysadmin

More information about the Beowulf mailing list