[Beowulf] Mark Hahn's Beowulf/Cluster/HPC mini-FAQ for newbies & some further thoughts

Tue Nov 6 06:46:05 PST 2012

On 11/05/2012 08:14 PM, Christopher Samuel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/11/12 20:02, Mark Hahn wrote:
>
>>> For serious work, the cluster and its software needs to survive
>>> power outages,
>> well, it's a cost-benefit tradeoff.  my organization has no power
>> protection on any compute nodes, though we do UPSify the storage.
> Agreed - we do the same here at VLSCI.   Having UPS for our IBM
> systems (iDataplex and BlueGene/Q) wouldn't make sense given our power
> around here is pretty good (touch wood) and would have had a sizeable
> impact on the cost of the data centre to house them. However, all our
> storage and infrastructure stuff like management node, etc, are on UPS.
>
> Our SGI cluster though is on UPS, it's in a different data centre
> where all racks are on UPS (DRUPS in this case) and there is no
> non-protected option.
>

For those too lazy to google 'DRUPS', I did it for you:

http://en.wikipedia.org/wiki/Diesel_rotary_uninterruptible_power_supply

At my previous employer, my cluster was only 64-nodes, so it's a bit 
smaller than most of your clusters, I think. The head node had software 
from the UPS vendor installed so it could listen for a signal from the 
UPS. As soon as the UPS lost incoming power and switched to battery, the 
head node did the following: (I was using SGE, so some of this 
terminology is SGE-specific):

1. Disable all queues on all execution nodes.
2. Kill all running jobs, and requeue them at the same time. Since all 
queues were disabled, the jobs would sit in the scheduler queue but not 
run.
3. Shutdown all cluster nodes using IPMI.

The head node would stay up so that if the power outage was for a brief 
time, I could log into it and turn on all the other nodes. If the UPS 
was running out of battery life, it would shut itself down gracefully. 
On power restoration, the queues would remain disabled so that no jobs 
would be started until I was confident power was permanently restored, 
and there was no issues that I needed to fix before jobs could run. 
Enabling the queues manually, was trivial, so this worked well.

For diskful cluster nodes of reasonable small size, I think this is a 
good strategy. By having the nodes shut down gracefully, you don't have 
to wait for your nodes to do fsck on boot up or anything like that which 
can happen with an abrupt loss of power. Imagine if I had to get on the 
console of every node to press 'y' at some fsck prompt during boot up.

However, as others have pointed out, for larger clusters this just isn't 
a practical approach.

Currently, I manage a Blue Gene /P, which is diskless/stateless, so the 
Blue Gene itself has no backup of any kind, but the service nodes and 
file system are on UPS. My area was hit very hard by Hurricane Sandy 
last week, so I learned the hard way what still needs to be configured 
for my Blue Gene to shutdown gracefully when the UPS runs out of battery.

--
Prentice