[Beowulf] Re: Why Do Clusters Suck?

Craig Tierney ctierney at HPTI.com
Tue Mar 22 15:28:44 PST 2005


On Tue, 2005-03-22 at 16:12, Stuart Midgley wrote:
> Of course, you can buy expensive hardware (take our AUD10mil 
> Alphaserver SC - based on 127 ES45's for example) and still have every 
> CPU replaced, every dimm replaced and every disk replaced over 3 years 
> :)  We are still loosing a node disk (10k rpm Ultra SCSI 320 disks) 
> roughly every 3-4 days.  Fortunately, our Quadrics has never failed.
> 

> Contrast this to our 152 node Dell cluster (single cpu PR350's), which 
> has had a couple of disks and a couple of fans fail in 2 years...
> 

Corollary to "when you buy cheap hardware you get cheap hardware":

"when you buy expensive hardware it still might be cheap".

We had a similar problem with Compaq XP1000 workstations.
Every time we turned the system off we would lose 5-10 power
supplies (out of 275).

A couple years later we bought some whitebox systems.  Three
times the node count with 1/10 the total hardware problems.

Craig


> Stu.
> 
> 
> > For the nodes themselves, you can buy systems with redundant power and
> > redundant disk.  You can buy from an system vendor that qualifies
> > their hardware much more rigorously than another.  When you buy cheap
> > hardware you get cheap hardware.
> 
> --
> <--------------------------------------------------------------------->
>    Dr Stuart Midgley                   |  stuart.midgley at anu.edu.au
>    Supercomputer Facility              |  smidgley at netspace.net.au
>    Leonard Huxley Building 56          |  +61 (0)2 6125 5988   Work
>    Australian National University      |  +61 (0)2 6125 8199   Fax
>    CANBERRA   ACT   0200               |  +61 (0)4 1125 2488   Mob
> 




More information about the Beowulf mailing list