disadvantages of a linux cluster
Jim Lux
James.P.Lux at jpl.nasa.gov
Wed Nov 6 16:21:32 PST 2002
>
> b) Uptime, measured as (total time systems are booted into the OS and
>available for numerical tasks/total mount of time ALL systems have been
>around).
>
>This means that if you have 9 systems booted and a hot spare, the best
>you can count for uptime is 90%. It also means that if a system crashes
>in the middle of the night and you don't get around to fixing it until
>the next day, you lose eight or twelve hours, not the ten minutes it
>eventually takes you to fix it after discovering the crash, pulling the
If the cluster were claimed to have 9 processors worth of processing
capability, and the OS and scheduler allow transparent use of the hot
spare, then, you could get 100% uptime as long as you only had 1 failure.
One could implement this in two generalized ways: 10 processors each
running at 90% (9 processors worth of "work", that is), or 9 processors
running full tilt, with one sitting idle. There are performance and
reliability variations (running full tilt runs hotter, which increases
failure probability.., but then, the idle unit is relatively cold, and
isn't "consuming life")...
In an extreme case, say you had mirroring, two complete copies of the
cluster, running in parallel.. it's not efficient (by some metric) but it
is potentially highly reliable, although as the Pfail of a given cluster
goes up, it starts to be worth the increased cost (computationally) of a
finer grain redundancy (i.e. the per node overhead goes up, but you
compensate by using more nodes)
Interestingly, Cornell has produced some interesting work in this area
(Spinglass, for instance), but it's unclear whether it's being used in a
production environment.
More information about the Beowulf
mailing list