[Beowulf] Re: Cluster doesn't like being moved (Steve Herborn)

David Mathog mathog at caltech.edu
Tue Mar 10 12:38:16 PDT 2009

"Steve Herborn" <herborn at usna.edu> wrote:

> I have a small test cluster built off Novell SUES Enterprise Server 10.2
> that is giving me fits.  It seems that every time the hardware is
> moved (keep getting kicked out of the space I'm using), I end up with any
> number of different problems. 

Off the top of my head...

1.  motherboard batteries may be going/gone, leading to BIOS changes
when unplugged during the move (or shut down for any extended period of
time), leading to failures.

2.  iffy wiring connections of any type (cards, data cables, power
supply cables, jumpers from case to motherboard, etc.) 

> Personally I suspect some type of hardware issue (this equipment is
about 5
> years old), but one of my co-workers isn't so sure hardware is in play.  I
> was having problems with the RAID initializing after one move back which I
> resolved a while back by reseating the RAID controller card.

That would be consistent with (2). If moving involves any "rolling on
small wheels over rough surfaces" failed electrical connections are a
common result.  We have a cart with about 10" inflated tires which is
used to move equipment, specifically to minimize this issue.  The last 2
racks we moved were completely disassembled, the frame moved first, then
the nodes moved to it on this cart.


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

