How do you keep clusters running....

Steve Gaudet SGaudet at
Wed Apr 3 13:26:28 PST 2002

Hello Chris,

> What are folks doing about keeping hardware running on large clusters?
> Right now, I'm running 10 Racksaver RS-1200's (for a total of 
> 20 nodes)...
> Sure seems like every week or two, I notice dead fans (each RS-1200
> has 6 case fans in addition to the 2 CPU fans and 2 power 
> supply fans).
> My last fan failure was a CPU fan that toasted the CPU and 
> motherboard.
> How are folks with significantly more nodes than mine dealing 
> with constant
> maintenance on their nodes?  Do you have whole spare nodes 
> sitting around-
> ready to be installed if something fails, or do you have a pile of
> spare parts?  Did you get the vendor (if you purchased 
> prebuilt systems)
> to supply a stockpile of warranty parts?
> One of the problems I'm facing is that every time something croaks, 
> Racksaver is very good about replacing it under warranty, but getting
> the new parts delivered usually takes several days.
> For some things like fans, they sent extras for me to keep on-hand.
> For my last fan/CPU/motherboard failure, the node pair will be 
> down ~5 days waiting for parts.
> Comments? Thoughts? Ideas?

The vendor of choise should be using quality parts.  We don't see these
issues here.  

Steve Gaudet 
Linux Solutions Engineer
| Turbotek Computer Corp.    tel:603-666-3062 ext. 21             |
| 8025 South Willow St.      fax:603-666-4519                     |
| Building 2, Unit 105       toll free:800-573-5393               |
| Manchester, NH 03103       e-mail:sgaudet at  |
|                            web: |


More information about the Beowulf mailing list