[Beowulf] Re: Why Do Clusters Suck?
Craig Tierney
ctierney at HPTI.com
Tue Mar 22 15:28:44 PST 2005
On Tue, 2005-03-22 at 16:12, Stuart Midgley wrote:
> Of course, you can buy expensive hardware (take our AUD10mil
> Alphaserver SC - based on 127 ES45's for example) and still have every
> CPU replaced, every dimm replaced and every disk replaced over 3 years
> :) We are still loosing a node disk (10k rpm Ultra SCSI 320 disks)
> roughly every 3-4 days. Fortunately, our Quadrics has never failed.
>
> Contrast this to our 152 node Dell cluster (single cpu PR350's), which
> has had a couple of disks and a couple of fans fail in 2 years...
>
Corollary to "when you buy cheap hardware you get cheap hardware":
"when you buy expensive hardware it still might be cheap".
We had a similar problem with Compaq XP1000 workstations.
Every time we turned the system off we would lose 5-10 power
supplies (out of 275).
A couple years later we bought some whitebox systems. Three
times the node count with 1/10 the total hardware problems.
Craig
> Stu.
>
>
> > For the nodes themselves, you can buy systems with redundant power and
> > redundant disk. You can buy from an system vendor that qualifies
> > their hardware much more rigorously than another. When you buy cheap
> > hardware you get cheap hardware.
>
> --
> <--------------------------------------------------------------------->
> Dr Stuart Midgley | stuart.midgley at anu.edu.au
> Supercomputer Facility | smidgley at netspace.net.au
> Leonard Huxley Building 56 | +61 (0)2 6125 5988 Work
> Australian National University | +61 (0)2 6125 8199 Fax
> CANBERRA ACT 0200 | +61 (0)4 1125 2488 Mob
>
More information about the Beowulf
mailing list