[Beowulf] Roadrunner shutdown

Fri Apr 5 14:53:56 PDT 2013

Brian,

>  Yes, if we're talking about developing entirely new methods.  But there's a
> /ton/ of low-hanging fruit that exists in the mix of systems tuning, compiler
> options, /basic/ code changes (not anything deep or time-consuming), etc.,
> that takes hours or, at most, a few days, and can have massive impacts.  The
> serial I/O -> parallel I/O example way (way, way) above being one example,
> and as another, I can't tell you the number of times I've seen people running
> production runs with '-O0 -g' as their compilation flags.  Or, not using
> tuned BLAS / LAPACK libraries.  Or running an OpenMP-enabled code with 1
> process per node but having OMP_NUM_THREADS set to 1 instead of 8.  Or
> countless other things.

Hours would be optimistic, unless you exclude the initial
profiling and other analysis.  In days one can often find
significant gains, at least in single-threaded performance.
As a general rule, I allow one week per application for
single-threaded optimization, and two weeks for parallel
optimization.

Of course, the more time one spends, the more one finds.
I am invariably limited by deadlines and/or budgets; but I
have never run out of ideas, even after months!

But I am usually given applications for which some attempts
at optimization have been made, so I take your point about
those really dumb mistakes.  They bring me joy. :)

(By the way, use of '-O0' can have another unintended
consequence: because '-O2' is often the most tested level,
there may be more errors at '-O0', so it is NOT the safest
optimization setting.)

Benchmarkers do not need to be on staff; so modulate their
cost by project term, and reevaluate the cost equation.

Cheers,

Max