[Beowulf] Roadrunner shutdown
Max R. Dechantsreiter
max at performancejones.com
Fri Apr 5 14:53:56 PDT 2013
> Yes, if we're talking about developing entirely new methods. But there's a
> /ton/ of low-hanging fruit that exists in the mix of systems tuning, compiler
> options, /basic/ code changes (not anything deep or time-consuming), etc.,
> that takes hours or, at most, a few days, and can have massive impacts. The
> serial I/O -> parallel I/O example way (way, way) above being one example,
> and as another, I can't tell you the number of times I've seen people running
> production runs with '-O0 -g' as their compilation flags. Or, not using
> tuned BLAS / LAPACK libraries. Or running an OpenMP-enabled code with 1
> process per node but having OMP_NUM_THREADS set to 1 instead of 8. Or
> countless other things.
Hours would be optimistic, unless you exclude the initial
profiling and other analysis. In days one can often find
significant gains, at least in single-threaded performance.
As a general rule, I allow one week per application for
single-threaded optimization, and two weeks for parallel
Of course, the more time one spends, the more one finds.
I am invariably limited by deadlines and/or budgets; but I
have never run out of ideas, even after months!
But I am usually given applications for which some attempts
at optimization have been made, so I take your point about
those really dumb mistakes. They bring me joy. :)
(By the way, use of '-O0' can have another unintended
consequence: because '-O2' is often the most tested level,
there may be more errors at '-O0', so it is NOT the safest
Benchmarkers do not need to be on staff; so modulate their
cost by project term, and reevaluate the cost equation.
More information about the Beowulf