<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 4/5/2013 9:43 AM, Lux, Jim (337C)
wrote:<br>
</div>
<blockquote cite="mid:CD842119.2F4FD%25james.p.lux@jpl.nasa.gov"
type="cite">
<pre wrap="">I would think that the problem is more that you can easily stamp out
another 1000 processors than another 10 software developers. HPC
developers are the scarce commodity, and just throwing money at it doesn't
solve the problem.</pre>
</blockquote>
<br>
I'm coming from an academic perspective, but to me it seems that
HPC developers are the scarce commodity not because they're hard to
find (though there's some truth in that, especially for ones with
broad experience), but because they're hard to categorize and the
benefits of a developer are hard to quantify. A lot of people
outside this relatively small Beowulf community have an overly
simplistic 'model' of how computing works -- hardware works or it
doesn't and if it does, a scientist writes a code and gets results.
It's like pressing a button, basically. Nobody thinks about
pressing a button <i>well</i>, or pressing it <i>efficiently</i>,
or even ensuring that whatever happens when it's pressed happens <i>fast</i>.
It's just assumed that it happens as fast as it can, provided the
button is 'working'.<br>
<br>
In this view, buying X more nodes makes a lot more sense than
hiring a developer. There are never enough cycles to begin with, so
all the developer does is help make sure the system or code
'works'. And if the system isn't working, well, that's what your
systems people are for. If the code isn't working? Well, some
lucky grad student will be up all night and day trying to fix it.
Sometimes for weeks. Months. Even years. And, all this time, the
'scientific throughput' on the whole is probably up on the system,
due to the additional nodes, even without those few applications
that aren't working.<br>
<br>
However, once you look at the <i>details</i>, often times the
scenario changes - why, yes, those extra nodes are being used all
the time, running an atmospheric model 24/7, on a thousand
processors. Except, whoops, it's using serialized I/O to your
underlying parallel file system - so while it's 'working', you could
be running 4-5x faster with some changes to a library and a few
changed settings. Is this something the scientist should know? We
can debate that; I'm on the side of 'no, they should focus on their
science', but there are points to be made on each side. The fact
is, though, that they often <i>don't</i> notice it,<i></i> whereas
someone whose focus is not on the scientific output but on the
computational methods and techniques would do so quickly. And,
like that, the equation changes - in the parallel universe where you
<i>hired</i> this person, you've now saved Y node-hours of
computation, with a fairly small time-slice of your expert. How
does this savings compare to the new nodes you'd buy? That depends
on the scale of the operation, but we can run some numbers in a
moment.<br>
<br>
<blockquote cite="mid:CD842119.2F4FD%25james.p.lux@jpl.nasa.gov"
type="cite">
<pre wrap="">That is why the real challenge for HPC is in developing smart compilers
and tools that make it easier to do HPC, even at the cost of needing more
computational resources.</pre>
</blockquote>
<br>
I'm absolutely in favor of smart(er) compilers and tools, but like
any tool, I think an expert will wield it with much greater skill,
experience and insight than a novice. Give a chef a knife and some
vegetables and watch as they're turned into perfectly cut pieces.
Give me the same knife and set of vegetables and half would end up
on the floor, I'd probably need a few band-aids, and we'd likely be
ordering take-out. Even 'simple' tools like compiler-enabled
profiling typically presents a lot of information to a user that
they're uncertain how to sift through at first, whereas an expert
will know exactly what to look for, how to use it, and get results
quickly.<br>
<br>
(It's occurred to me we might be using 'HPC developer' in very
different ways -- as a dedicated person, <i>developing</i> the
majority of a model some of this might not apply. As a 'specialist'
who works with other scientists to allow them to focus on their
science while they focus on the code, calculations and systems, I
think it applies pretty well.)<br>
<br>
<blockquote cite="mid:CD842119.2F4FD%25james.p.lux@jpl.nasa.gov"
type="cite">
<pre wrap="">Just back of the enveloping here..
Let's say a "node" costs about $3k plus about $1500 in operating costs
over 3 years. Make it a round $5000 all told. A developer costs about
$250-300k, fully burdened for a year. So I can make the choice.. Buy
another 150 nodes or pay for the developer to make the processing more
efficient. Of course, if I buy the nodes, I get the faster computation
today. If I buy the developer, I have to wait some time for my faster
solution.</pre>
</blockquote>
<br>
I haven't looked at hardware prices in a while, but I'd argue
slightly higher hardware costs -IB, more storage, etc.- not just on
the node, but ports for the switches, extra disks or Lustre servers,
let's say $6K all around, including operating costs. I admittedly
don't know much about employee costs, but let's imagine a mid-level
HPC specialist at a university having a salary of $75K. The fringe
rate at a few places I just checked hovers around 33-36% or so for
this type of employee, so ignoring things like office space costs,
power, incidentals, etc., and just going with the cost to the
university as 'salary * (1.0 + fringe rate)', we come up with just
over $100K. Let's bump it up to $120K just because.<br>
<br>
For simplicity lets keep the salary static over four years (chosen
as the life of the nodes), and now we're comparing $480K for an
employee or 80 nodes, and what delivers better usage. The extreme
case of having lots of nodes is, in my opinion, pretty easy. If I
have 2000 nodes now, adding 80 gives me a 4% improvement in my
capabilities, and I'll quite happily challenge anyone who thinks I
can't improve either their codes, workflow or system by more than
4%. What if I have only 200 nodes, though? Then the extra 80 gives
me a 40% increase in job throughput. That <i>sounds</i> pretty
good, and in reality it might be if your codes are well-behaved,
production-quality codes that are tuned, your scientists know how to
modify them for new experiments, and there isn't much need of
expertise, but 40% more cycles would help. I've yet to meet a
scientific department that meets that description, though. From
people running N^2 algorithms when N-log-N methods exist, to people
using NFS as local scratch for large temporary files, to people
managing literally thousands of files <i>by hand</i> for a
parameter-sweep MC code because scripting isn't something they're
familiar with, a decent level of expertise can <i>often</i> render
2-3x factor improvement in usage, and <i>sometimes</i> much more,
but the really high cases (1000x+) are pretty rare. Still, we're
talking a 2-3x factor being common, so even if the 'average' gain
when normalized across all your resources is a mere 1.5x, that still
beats 1.4. And we're just talking about savings in CPU time, not
savings in scientist time.<br>
<br>
Of course, then you have the extreme case of very <i>few</i>
nodes - if you've only got twenty, and are looking at either 80 more
or a person, well, it's going to depend heavily on what's being
done, and I'd typically lean towards the nodes if I had only that
binary choice. If I could be creative, though, I'd aim to
coordinate with multiple departments, buy maybe 40-60 nodes, and
hire a person. Besides, that expertise might help you take
advantage of off-site resources like XSEDE even if you have zero
nodes locally.<br>
<br>
<blockquote cite="mid:CD842119.2F4FD%25james.p.lux@jpl.nasa.gov"
type="cite">
<pre wrap="">If I buy the developer, I have to wait some time for my faster solution.
</pre>
</blockquote>
<br>
Yes, if we're talking about developing entirely new methods. But
there's a <i>ton</i> of low-hanging fruit that exists in the mix of
systems tuning, compiler options, <i>basic</i> code changes (not
anything deep or time-consuming), etc., that takes hours or, at
most, a few days, and can have massive impacts. The serial I/O
-> parallel I/O example way (way, way) above being one example,
and as another, I can't tell you the number of times I've seen
people running production runs with '-O0 -g' as their compilation
flags. Or, not using tuned BLAS / LAPACK libraries. Or running an
OpenMP-enabled code with 1 process per node but having
OMP_NUM_THREADS set to 1 instead of 8. Or countless other things.<br>
<br>
So that's my lengthy two cents in defense of why it's <i>very
often</i> favorable to hire HPC specialists over more hardware -
the gains are certainly much harder to quantify, and clearly more
variable depending upon the various projects and applications in
use, but in my experience -and in our environment- the gains in
terms of making scientists' jobs easier and less time consuming, as
well as the saving in CPU hours, more than makes it worth it.
Plus, in those rare moments we're not working, the HPC developers
can contribute to discussions on the Beowulf list. Surely that's
something we can all agree we need more of! :)<br>
<br>
Cheers,<br>
- Brian<br>
<br>
<br>
(PS. One thing I cleverly avoided touching upon way back at the top
is that while a skilled computational person can indeed ensure that
calculations are more efficient, fast and working well, they can <i>also</i>
make sure they're working <i>correctly</i>. That's a can of worms
that's a different discussion! Getting results /= getting <i>correct</i>
results. )<br>
<br>
(PPS. Sorry for the length!)<br>
</body>
</html>