[Beowulf] SC13 wrapup, please post your own

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Tue Nov 26 06:15:28 PST 2013

On 11/25/13 4:11 PM, "Adam DeConinck" <ajdecon at ajdecon.org> wrote:

>Hash: SHA512
>> 4. I went to a BoF on ROI on HPC investment. All the presentations in
>> the BoF frustrated me. Not because they were poorly done, but because
>> they tried to measure the value of a cluster by number of papers
>> published that used that HPC resource. I think that's a crappy, crappy
>> metric, but haven't been able to come up with a better one myself yet.
>> was very vocal with my comments and criticisms of the presentations, so
>> if any of the presenters are reading this now, I apologize for
>> hi-jacking your BoF. Getting good ROI on a cluster is close to my
>> but is also difficult to quantify and measure. I hope I can be part of
>> the discussion next year.
>Do you have any thoughts you can share on what alternative metrics
>might look like, even if you can't think of one that's clearly better?
>I have no horse in this race as I've been doing industry HPC for the
>past few years, but I'm curious what good metrics for ROI on an academic
>or lab cluster might be. Total number of papers? Number of
>citations after an N-year time window? [shrug]
>ROI measurement can sometimes be difficult even in an industrial or
>commercial setting, especially if the HPC resource is used for R&D or
>"engineering support" as opposed to something that feeds directly into
>the product.

Definitely a challenge.

Maybe we have webcams that look at all the users and we calculate
percentage of time smiling while interacting with the cluster?

ROI for "technology development" is a tough thing to calculate.  ROI, by
it's nature is a "money returned for money spent", and the return is
somewhat intangible.

All of these metrics require having a baseline so you can do a
before/after comparison.  And realistically, there needs to be a fairly
long averaging time on the metric.  Here's the annual paper output of a
noted physicist.  

1901  1
1902  2
1903  1
1904  1
1905 25
1906  6
1907  8
1908  4
1909  5
1910  6
1911  8

How would you evaluate the ROI of feeding him?  Started kind of slow, had
a really good year, and likely received a "exceeds expectations" annual
review. But that set a new bar, and now his supervisor is going to be
hammering him.. Dude, your output is slacking off, I think we need to put
you on a performance improvement plan, and this year, you're going to be
"does not meet" in your review.

The other problem is that paper publishing (and the schedule thereof) is
influenced by things other than availability of computational resources,
so you need a very large sample so those influences average out.  For
instance, lack of funds or permission to travel to a conference, combined
with the recent fad of "you must present in person" will have an effect.
The sequester and/or furlough will almost certainly manifest itself in any
sort of time series counting publications.

The other thing is that there is a long gestation period for some work.
You might not have something "publishable", especially with the bias
against publishing null or negative results. That doesn't mean that the
HPC work wasn't useful, if it found a bunch of "ways not to go".

There might also be a availability of workforce to grind out the papers
issue.  At least at JPL, relatively few people work on a single job or
task.  A more typical scenario is having 2 or 3 projects you work on
simultaneously, along with half a dozen things you support. There is a
tendency to spend one's time on the latest thing to go wrong, and in a "do
more with less" environment, there's not a lot of down time in which to
catch up.

In an environment where short term results are more important (or, at
least have more "gain" in the control loop) it's tough to push "getting
published" higher up the priority list, since the personal ill effects of
not publishing may be years down the road, compared to immediate ill
effects of "the project will be cancelled if we don't make the deliverable
this month".   

At JPL, it is easy to tell in which organizations, the "papers published"
metric is important in annual ranking and review: they're the ones with
lots of papers.  That's not to say that other organizations don't do lots
of publishable work, but if your annual review depends on something
*other* than the metric, you're not going to spend your time doing it.

Export controls also rear their ugly head.  A lot of interesting problems
that can be attacked by HPC are the practical ones. But in a number of
industries, once you move beyond pure research and theory (TRL 3), you get
into an area where it is either competition sensitive or export
controlled.    It is true that you could write a paper that is suitably
expurgated and sanitized, but you still have to go through the export
control/public release review process. And that's time consuming too.

The whole competition sensitive/proprietary rights/ export controls aspect
might be why some of you have commented on the gulf between what's
presented on the show floor and what's presented in the talks.

More information about the Beowulf mailing list