[Beowulf] Cluster doesn't like being moved
hahn at mcmaster.ca
Tue Mar 10 12:05:50 PDT 2009
> moved (keep getting kicked out of the space I'm using), I end up with any
> number of different problems.
debugging is mainly about breaking down the system into components
whose correctness can be observed separately.
> Personally I suspect some type of hardware issue (this equipment is about 5
> years old), but one of my co-workers isn't so sure hardware is in play. I
> was having problems with the RAID initializing after one move back which I
> resolved a while back by reseating the RAID controller card.
sounds a bit blackmagic to me. I don't believe I've ever had a problem
solved by card reseating (though dimm reseating does seem to clean up
40% of of the nodes I see that are reporting a lot of corrected ecc's.)
> This time It appears that the file system & configuration databases became
> corrupted after moving the equipment. Several services aren't starting up
> (LADP, DHCP, PBS to name a few) and YAST2 hangs any time an attempt is made
simplify. to me, it sounds like your network (ip, route, dns) is confused.
More information about the Beowulf