(no subject)

Fri Apr 12 06:17:57 PDT 2002

On Thu, 11 Apr 2002, Eric Miller wrote:

> Myrinet.  We are not experts, we have ALOT of questions, and all we want to
> do is see Linux do something cool that we can show our
> freinds/students/selves.
> 
> Robert, thank you for your positive and informative reply.

I appreciate your interest and understand your goals, actually; whenever
I present beowulfery to a new group (which I seem to do three or four
times a year) I do exactly the same thing -- a bit of a dog and pony
show.  In addition to the pvm povray, I like the pvm mandelbrot set demo
(xep) which I've hacked so the colormap is effectively deeper and so
that it doesn't run out of floating point room so rapidly.  I've been
using or playing with mandelbrot set demo programs long enough that I
can remember when it would take a LONG time to update a single
rubberbanded section.  

Nowadays one can quickly enough get to the bottom of double precision
resolution even on a single CPU -- 13 digits isn't really all that many
when you rubberband down close to an order of magnitude at a time.
Still, with even a small cluster you can get nearly linear speedup and
actually "see" the nodes returning their independent strips -- if you
have mix of "slow" nodes and faster ones you can even learn some useful
things about parallel programming just watching them come in and
discussing what you see.

The only point I was making is that your class should definitely take
the time to go over at least Amdahl's law and one of the improved
estimates that account for both the serial fraction and the
communications time, and get some understanding of the

  embarassingly parallel (SETI, distributed monte carlo) -> coarse
grained, non-synchronous (pvmpov, xep) -> coarse grained, synchronous
(lattice partitioned monte carlo) -> medium-to-fine grained,
(non-)synchronous (galactic evolution, weather models)

sequencing where for each step up the chain one has to exercise
additional care in engineering an effective cluster to deal with it.  EP
chores (as Eric pointed out) are "perfect" for a cluster because "any"
cluster or parallel computer including the simplest SMP boxes will do.
Coarse grained tasks will also generally run well on a "standard" linux
cluster -- a bunch of boxes on a network, where the kind of network and
whether the boxes are workstations, desktops in active use, or dedicated
nodes doesn't much matter.  When you hit synchronous tasks in general,
but especially the finer grained synchronous tasks (tasks where all
nodes have to complete a parallel computation sequence -- reach a
"barrier" -- and then exchange information before beginning the next
parallel computation sequence) then you really have to start paying
attention to the network (latency and bandwidth both), it helps to have
dedicated nodes that AREN'T doing double duty as workstations (since the
rate of progress is determined by the slowest node), and most of these
tasks have a strict upper bound on the number of nodes that one can
assign to a task and still decrease the time of completion.

This last point is a very important one.  It is easy to see a coarse
grained task speed up N-fold on N nodes and conclude that all problems
can them be solved faster if we just add more nodes.  Make sure that
your students see that this is not so, so that if they ever DO engineer
a compute cluster to accomplish some particular task, they don't just
buy lots of nodes, but instead do the arithmetic first...

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu