[Beowulf] Cluster Metrics? (Upper management view)
Prentice Bisbal
prentice at ias.edu
Fri Aug 20 12:05:44 PDT 2010
I couldn't have said it better myself. Be wary of suits asking for
asking for numbers.
Michael Di Domenico wrote:
> I think measuring a clusters success based on the number of jobs run
> or cpu's used is a bad measure of true success. I would be more
> inclined to consider a cluster a success by speaking with the people
> who use it and find out not only whether they can use it effectively
> and/or what new science having cluster is being enabled by them.
>
> then only thing i find most of the below metrics overly useful is
> figuring out whether or not we need a bigger cluster. which i guess
> is a form of measurable success, but not one in which i would consider
> the "cluster" to be a success. it could just be dopes running
> thousands of "/bin/hostname" jobs trying to figure out how to use the
> cluster
>
> I also think you need to ask the "business" people what measure they
> would consider a cluster as a worthwhile investment, it doesn't sound
> as if you have that from your email.
>
>
>
> On Fri, Aug 20, 2010 at 1:34 PM, Stuart Barkley <stuartb at 4gh.net> wrote:
>> What sort of business management level metrics do people measure on
>> clusters? Upper management is asking for us to define and provide
>> some sort of "numbers" which can be used to gage the success of our
>> cluster project.
>>
>> We currently have both SGE and Torque/Moab in use and need to measure
>> both if possible.
>>
>> I can think of some simple metrics (well sort-of, actual technical
>> definition/measurement may be difficult):
>>
>> - 90/95th percentile wait time for jobs in various queues. Is smaller
>> better meaning the jobs don't wait long and users are happy? Is
>> larger better meaning that we have lots of demand and need more
>> resources?
>>
>> - core-hours of user computation (per queue?) both as raw time and
>> percentage of available time. Again, which is better (management
>> view) higher or lower?
>>
>> - Availability during scheduled hours (ignoring scheduled maintenance
>> times). Common metric, but how do people actually measure/compute
>> this? What about down nodes? Some scheduled percentage (5%?) assumed
>> down?
>>
>> - Number of new science projects performed. Vague, but our
>> applications support people can just count things occasionally.
>> Misses users who just use the system without interaction with us.
>> Misses "production" work that just keeps running.
>>
>> Any comments or ideas are welcome.
>>
>> Thanks,
>> Stuart Barkley
>> --
>> I've never been lost; I was once bewildered for three days, but never lost!
>> -- Daniel Boone
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
More information about the Beowulf
mailing list