A beowulf for parallel instruction.

Thu Nov 2 08:32:28 PST 2000

On Wed, 1 Nov 2000, J, A. Llewellyn wrote:

> discussed. The experiments you suggest are enticing but
> remember our primary objective is the software training
> end ( even if I didn't make it very clear in my post).
> The multiple small cluster suggestion is something that
> has also cropped up. I wonder what size we need in
> order to make the difficulty in parallelization
> organization to be apparent without it overwhelming the
> entire process. If we can bring this off with a half
> decent lab it will be marvelous. It looks as if we need
> to think in terms of under 32 nodes total, (all
> flavors) and assembling a list of candidate NICS,
> switches etc. Any priorities to suggest?

Nothing too concrete beyond what was in my first response.  If you're
focusing on parallel programming instruction (semispecialized to
beowul[v,f]ies) then it would be useful to illustratively teach Amdahl's
Law and generalizations thereof (parallel scaling).  Paradoxically, with
a small cluster you DON'T want to have bleeding edge communications in
order to see nice (i.e. "bad") parallel scaling curves as you distribute
a task among more and more processors.  That is, I'd guess that you will
want to teach them to take a task that DOESN'T scale too well at least
in a naive parallelization and "solve the problem" of how to make it run
efficiently on the available hardware, as well as solve the matching
problem of recognizing when it will NEVER run efficiently on the
available hardware and what hardware modifications are required to make
it run efficiently.  

This is the motivation for selecting hardware that supports sequential
steps to improve the network and/or the swapping of the underlying
hardware to illustrate how faster and slower CPU's, memory, and so forth
can affect parallel program design.  You thus might want switches/NICs
you can force into 10BT mode for one part and then reset/reboot into
100BT mode.  A 10:1 multiplier in the serial IPC fraction ought to let
you come up with a set of problems that scale terribly @ 10Mbps but that
scale reasonably well @100Mbps, which one can then at least mentally
extrapolate to 1000Mbps even without the much more expensive hardware.
This suggests manageable switches where this can be done.

My bias in all of this is pretty clear.  I tend to think in terms of
beowulf engineering (matching the hardware to the problem) more than
software engineering (matching the problem to the hardware) because the
traffic on the list involves the question "How do I design a beowulf
that will do well on my problem" much more often than it ask "How to I
write my problem so that it runs well on my beowulf."  Both are
important, of course, as the answer to either one depends at least in
part on the other.  A moderately heterogeneous laboratory would let you
explore both sides independently and as a coupled problem.  On the one
hand, your students could learn that increasing the processor speed on a
problem that is IPC bound may not yield tremendous benefits in terms of
speedup (and may be very costly) by running the same problem on
"identical" networks with differing node speeds.  They can also see when
e.g. memory bandwidth matters by running on nodes with different memory
bandwidths.  Alpha nodes, for example, have exemplary memory speed, and
for certain kinds of problems this is a big win.  For others, their
tremendously inflated cost is a big loss.

In conclusion, I personally think that it is hard to teach a "pure"
beowulf parallel programming course because of the tremendous range of
hardware and software that can be connected together into a "beowulf".
There are damn few assumptions you can make about specific rates in any
given beowulf, and all sorts of "critical" performance features in the
software design from the speedup curve for a given algorithm to the
optimum algorithm itself can depend nonlinearly or even discontinuously
on those performance and design features.  Of course there are
applications you can program that are relatively insensitive to design,
but they are the "easy" ones -- relatively coarse grained or
embarrassingly parallel.  I assume you want to teach your students to do
well with the harder ones.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu