[Beowulf] Cluster doesn't like being moved

Mark Hahn hahn at mcmaster.ca
Tue Mar 10 12:05:50 PDT 2009

> moved (keep getting kicked out of the space I'm using), I end up with any
> number of different problems.

debugging is mainly about breaking down the system into components
whose correctness can be observed separately.

> Personally I suspect some type of hardware issue (this equipment is about 5
> years old), but one of my co-workers isn't so sure hardware is in play.  I
> was having problems with the RAID initializing after one move back which I
> resolved a while back by reseating the RAID controller card.

sounds a bit blackmagic to me.  I don't believe I've ever had a problem
solved by card reseating (though dimm reseating does seem to clean up
40% of of the nodes I see that are reporting a lot of corrected ecc's.)

> This time It appears that the file system & configuration databases became
> corrupted after moving the equipment. Several services aren't starting up
> (LADP, DHCP, PBS to name a few) and YAST2 hangs any time an attempt is made

simplify.  to me, it sounds like your network (ip, route, dns) is confused.

