Uptime data/studies/anecdotes ... ?

Roger L. Smith roger at ERC.MsState.Edu
Tue Apr 2 08:15:00 PST 2002


We currently run an average of about 75% utilization on our 586 processor
(293 node)  cluster.  We probably have about one node per week crash and
hang for various reasons.

We have occasional problems with memory leaks or PBS hangups which require
large scale reboots of the cluster. (Actually, PBS just died as I'm typing
this, but our pbs heartbeat script should restart it automatically in a
few minutes).  I'd say we have to do a full reboot of the cluster about
every 3-4 months.

For a bunch of PC hardware running a free OS, this seems like a pretty
good number to me.  It's not in the same class as our Sun servers (nor
even our SGIs!), but then, none of those systems are this large, either.



On Tue, 2 Apr 2002, Richard Walsh wrote:

>
> All,
>
> What information is available on typical uptimes
> of large-scale, clusters ... say greater than 256
> processors and running a multi-user workload. What
> gains do single-point-of-administration tools like
> SCYLD provide?  Clearly, there are a great number
> of things one can do to maximize uptime/utilization
> (not the same thing really).  What are the essentials
> from the lists point of view?
>
> If a good figure is, say, 80% utilization over a
> 8760 hour year today, what will this number be in
> three years?  Annual utilization for the 1088 processor
> T3E we run here is about 95%.  How long until a similarly
> sized cluster typically yields the same value?
>
> Regards,
>
> rbw
>
> #---------------------------------------------------
> #
> # Richard Walsh
> # Project Manager, Cluster Computing, Computational
> #                  Chemistry and Finance
> # netASPx, Inc.
> # 1200 Washington Ave. So.
> # Minneapolis, MN 55415
> # VOX:    612-337-3467
> # FAX:    612-337-3400
> # EMAIL:  rbw at networkcs.com, richard.walsh at netaspx.com
> #
> #---------------------------------------------------
> # "What you can do, or dream you can, begin it;
> #  Boldness has genius, power, and magic in it."
> #                                  -Goethe
> #---------------------------------------------------
> # "Without mystery, there can be no authority."
> #                                  -Charles DeGaulle
> #---------------------------------------------------
> # "Why waste time learning when ignornace is
> #  instantaneous?"                 -Thomas Hobbes
> #---------------------------------------------------
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


 _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
| Roger L. Smith                        Phone: 662-325-3625               |
| Systems Administrator                 FAX:   662-325-7692               |
| roger at ERC.MsState.Edu                 http://WWW.ERC.MsState.Edu/~roger |
|                       Mississippi State University                      |
|_______________________Engineering Research Center_______________________|




More information about the Beowulf mailing list