disadvantages of a linux cluster
Robert G. Brown
rgb at phy.duke.edu
Wed Nov 6 06:18:38 PST 2002
On Tue, 5 Nov 2002, Mark Hahn wrote:
> > 256-processor Intel clusters (home grown apps). We run in parallel with
> > MPI Pro and Cluster Controller and Windows 2000. Reliability is 5-nines;
> > manageability tools have helped us to reduce systems administration
> > costs/staff.
>
> so what would be the list price of that software? do you have any
> data on how reliability would compare with a linux approach?
> also, .99999 is impressive, only 5 minutes a year; how long have
> you had the cluster? is that .99999 counted for all nodes,
> or do you mean "at least some nodes worked for .99999 of the time"?
>
> if you really mean that the sum of all downtime (across all 256 nodes)
> is 5 minutes/year, that's truely remarkable!
I agree. In fact, hardware alone is a lot less reliable than that.
You've been amazingly lucky. Even with Dell hardware we've never gone a
year without some sort of hardware failure that involved a day or so of
downtime (or expensive onsite service contracts and/or lots of spare
parts sitting around), and one day contains 1440 minutes, or more than
five minutes per node for 256 nodes. Just diagnosing a failed part
(like a bad memory DIMM or crashed disk or burned motherboard) usually
takes a few hours. So you've either really got (effectively) 258
systems with a couple of them functioning as more-or-less-hot spares or
have had phenomenally good luck.
If the latter, you might try computing your uptime including the hot
spares.
rgb
>
> thanks, mark hahn.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list