Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Cluster Metrics? (Upper management view)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Prentice Bisbal prentice at ias.edu
Fri Aug 20 12:05:44 PDT 2010


I couldn't have said it better myself. Be wary of suits asking for
asking for numbers.



Michael Di Domenico wrote:
> I think measuring a clusters success based on the number of jobs run
> or cpu's used is a bad measure of true success.  I would be more
> inclined to consider a cluster a success by speaking with the people
> who use it and find out not only whether they can use it effectively
> and/or what new science having cluster is being enabled by them.
> 
> then only thing i find most of the below metrics overly useful is
> figuring out whether or not we need a bigger cluster.  which i guess
> is a form of measurable success, but not one in which i would consider
> the "cluster" to be a success.  it could just be dopes running
> thousands of "/bin/hostname" jobs trying to figure out how to use the
> cluster
> 
> I also think you need to ask the "business" people what measure they
> would consider a cluster as a worthwhile investment, it doesn't sound
> as if you have that from your email.
> 
> 
> 
> On Fri, Aug 20, 2010 at 1:34 PM, Stuart Barkley <stuartb at 4gh.net> wrote:
>> What sort of business management level metrics do people measure on
>> clusters?  Upper management is asking for us to define and provide
>> some sort of "numbers" which can be used to gage the success of our
>> cluster project.
>>
>> We currently have both SGE and Torque/Moab in use and need to measure
>> both if possible.
>>
>> I can think of some simple metrics (well sort-of, actual technical
>> definition/measurement may be difficult):
>>
>> - 90/95th percentile wait time for jobs in various queues.  Is smaller
>> better meaning the jobs don't wait long and users are happy?  Is
>> larger better meaning that we have lots of demand and need more
>> resources?
>>
>> - core-hours of user computation (per queue?) both as raw time and
>> percentage of available time.  Again, which is better (management
>> view) higher or lower?
>>
>> - Availability during scheduled hours (ignoring scheduled maintenance
>> times).  Common metric, but how do people actually measure/compute
>> this?  What about down nodes?  Some scheduled percentage (5%?) assumed
>> down?
>>
>> - Number of new science projects performed.  Vague, but our
>> applications support people can just count things occasionally.
>> Misses users who just use the system without interaction with us.
>> Misses "production" work that just keeps running.
>>
>> Any comments or ideas are welcome.
>>
>> Thanks,
>> Stuart Barkley
>> --
>> I've never been lost; I was once bewildered for three days, but never lost!
>>                                        --  Daniel Boone
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ



More information about the Beowulf mailing list