[Beowulf] Why one might want a bunch o' processors under your desk.

Robert G. Brown rgb at phy.duke.edu
Tue May 10 11:54:45 PDT 2005


On Tue, 10 May 2005, Jim Lux wrote:

> To me, this is sort of the essence of engineering:  Engineers are lazy, and
> want to find a way to accomplish a task with the minimum effort on their
> part.  Hence, we have plows and tractors instead of digging with a stick,

This is the essential issue, and we all face it all the time.  Do I
spend ten minutes a cycle doing a repetitive task by hand for as many as
100 cycles (burning 1000 minutes over several weeks of work) or do I
spend some unknown number of minutes writing and debugging a script or
piece of code that will reduce MY time to one minute (to initiate the
task and collect the results) and take 1000 minutes of unattended time
to complete on my system, or invest some probably larger amount of time
to parallelize it so that it runs on 1000 systems in 1 minute so that I
can do weeks' worth of work in a minute.

If it is bounded at REALLY JUST 1000 minutes of work, many of use would
just do the damn thing by hand.  If we are script-kids with mad skills
and the task looked easy to do, we might script it if we could do so in
an afternoon, or in any event less than a day.  We'd be more inclined to
do this if the task were error-prone as well as repetitive, as computers
are really more reliable once the script is debugged.  If it were more
like 1000 minutes a month indefinitely -- well, that is worth spending a
week or even a month on developing ways of reducing our time
involvement -- we'd win big before a year was out and wouldn't commit
suicide from the sheer boredom of punching in lots of mindless keys.  If
the result itself has VALUE -- so that time matters -- and various sets
of resources are available we might consider parallelizing it,
especially if the problem has an obvious decomposition (such as running
1000 instances of 1 minute of independent work each on as many systems
as happen to be available, unattended).

I've come up on the short end of this (either way) over and over again.
I used to run EP tasks on a network of computers the hard way -- login
via rsh and run them, one at a time.  Then I started using a big network
of computers (100+ systems) so I learned expect and TCL, wrote an expect
script, and could submit and manage jobs twenty at a time with a single
command line.  It took a half week to learn and write, but BOY did it
pay off.  The SAME task I also wrote in PVM (using PVM mostly to
distribute the task and reduce the results).  It took weeks to learn PVM
and to debug the task, weeks stretching to monts.  Then PVM turned out
to be not terribly robust against a worker node crashing -- it was a
pain to get the task restarted and until it was I was wasting the
rebooted resource.  Barely broke even in efficiency, but learned PVM and
some valuable lessons.  Same thing for data processing, writing little
scripts to turn data into graphs, doing various account management
things -- sometimes scripting or coding saves time, sometimes not,
depending on just how much and how often I'm going to be doing
something.

What I've learned from all of this is that surprisingly often it is
worth doing even in the cases of marginal benefit.  If there is ANY
chance that the work will be repetitive and repeated over some time, it
is almost certainly better to tackle the task of automating,
parallelizing, and so forth, at least if you have a very clear idea of
the work process involved and can put sane boundaries on the programming
tasks involved.  This is for several reasons:

   a) You almost always will end UP doing the task MORE than you
initially estimate, especially once it is easier and less time consuming
to do so.

   b) You almost always learn something from the process that in the
long run increases your value and productivity, even if the actual
result loses a bit in terms of up front productivity.

   c) Sometimes you discover things in the process that alter the value
of the task itself.  As in -- "Oh?  You can framble the woodgets in only
one minute apiece now when you're not even there?  My GOD that means I
can actually tackle the twerking of the snagglefrotz!  Eureka!"  Or the
script itself is semiportable and lets you streamline four or five tasks
and not just the one you originally considered.  Or you are forced to
reorganize the task itself to efficiently code it, and whole new vistas
of value open themselves up to you as you work.

That's why I never have any "free" time (despite what Greg might think
based on the prolixity of my bot:-).  I >>always<< have 3-5 coding
projects that get time slices at low priority working around my primary
task queues.  Each of them has both immediate and long term benefits.

For a very specific and apropos example, on Friday I started to learn
octave, as we've used matlab for a semester here teaching and I managed
to avoid learning it or using it myself because of its absurdly
expensive site license.  That is, up to then I had used the "avoidance"
strategy because of the unknown investment required to learn a new thing
that might conceivably improve my teaching or research or speed up
students' learning.  On Friday class was nearly done and I risked the
time investment.

By Friday evening I had three functioning octave programs that solved
simply dynamics problems (simple harmonic oscillator, gravity/orbit,
gravity/orbit with einstein term to illustrate precession) and plotted
e.g. trajectories or whatever and was working on a nonlinear damped
driven harmonic oscillator program that I just completed last night that
demonstrates deterministic chaos.  Two of the four programs were
basically ports of matlab programs, so I now can see how
matlab-compatible octave really is.  I've encountered some of its
problems (its memory management SUCKS, but I'll bet matlab's does too).

So NOW I'm just KICKING myself for not having done this at the beginning
of the semester.  It was easy, and there are lots of fairly obvious
benefits.  There are plenty of people on campus who are using matlab for
fairly trivial interactive use or simple scripts who have to pay order
of $100/year for the privilege -- I'm now convinced that a significant
fraction of them could convert to using octave for free and NOT have to
pay a big penalty for the port (I'm sure they're hesitating because of
THEIR uncertainty of the risks vs benefits).  My two workdays of
invested time might thus provide:

 a) A new free tool I can use in my research for this and that without
the hassles of maple or mathematica or matlab (all site licensed with
highly variable costs).

 b) A free tool I WILL use to teach intro physics to majors this fall
that will permit my students to integrate visualization and numerical
solution of a few of the vast array of
non-analytically-solvable-but-common physics problems that are out there
right "next" to the analytically solvable problems they study.  Either
matlab or octave are available to all undergraduate students, and the
code is nearly identical on the two systems for this sort of problem.

 c) A means whereby I can reassure various other researchers that octave
is a viable solution to at least simple problems they might be using
matlab to study, or to their own teaching needs.  In cases where they
successfully port off of paid-for matlab to free octave on their
(already) linux systems, hundreds of dollars in savings accrue to their
groups and ultimately to the University if/when its need for matlab
licenses goes down.  Tens of thousands a year might be saved, overall.

Not bad for two days of effort.

Similar analyses might well apply to your work.  You've described very
well why there still exists a vast collection of antique fortran code
out there that people still use simply because nobody wants to tackle
the issues of porting and validating the port.  Yet this sort of
thinking is really short-sighted and silly -- if you think of the lost
productivity associated with YEARS TO DECADES of that code continuing to
be applied in a highly inefficient, impossible to maintain form that
nobody dares to touch because nobody really understands it any more and
the graduate students that once wrote it are working at a car dealership
in Des Moines...

My own knowledge of phased array antennae and the problems you'll
probably have to surmount to (re)write the code you need suggests that
there is a really excellent chance that while doing so you'll discover
or think up some new things that will pay you tenfold for the time you
invest in it.  There is a group in India I've been talking to that is
interested in the same problem, BTW -- they are stuck using a very
expensive commercial package that doesn't parallelize, and in their
environment they cannot AFFORD to just throw money at the problem and
have to throw human intelligence instead.  If you, and they, got
together and defined the problem, and found some code base to start
from, and then designed a really good open source tool, it might enable
MY favorite problem -- that of turning all the radiophone towers on the
planet into one giant LOFAR with a resolution base of 5000 km or so.
Then we'd finally be able to pick up extraterrestrial intelligent life
and all be famous.

See?

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list