[Beowulf] What now?

Robert G. Brown rgb at phy.duke.edu
Wed Aug 18 08:00:08 PDT 2004

On Wed, 18 Aug 2004, Jack C wrote:

> Thanks for the input guys. I'd just like to address something Mark Hahn said:
> > anyone with moderate serial programming skill should be able to pick up
> > MPI basics in a couple days.  I believe that MPI is daunting to beginners
> > mainly because it's such a fat interface, and people think that they need
> > to use one-sided comms, collectives, non-blocking stuff all in their first
> > program.  (C++ is quite similar - easy and safe if you don't bother with
> > the more fringe features.)
> I can write is C fairly proficiently (for a high-schooler at least), but I'm 
> simply stuck at how to design the program flow to use the other nodes. 
> Example: I wrote a quick hack to seach for primes. Except, with no real calls 
> to MPI, it just runs the same code on all the nodes. How might I split it up? 
> Ie, have one node that comes up with numbers and keeps track, and others that 
> test if it's prime? 
> I don't really expect an answer to that question, I was simply wondering if 
> anyone might know of some articles/tutorial/etc that deal with this.

There are additional resources online that you might look into to learn
about this.  Ian Foster has a lovely online book on parallel programming
(at the algorithmic and design level) that is likely a bit advanced for
you in parts but is nevertheless a truly excellent (and free!) resource.
I have an online book on cluster engineering and operation linked to the
brahma site:


(and linked to my personal home page, url below).  Finally, as noted in
a previous reply, I've covered at least some of this in my very first
"Cluster Kickstart" CWM columns, as have the other columnists in their
own venues.  It sounds like Doug is working hard on putting together an
archival CD of previous issues (which is good, because based on the
frequency with which I get requests for old material there is
considerable demand for it).

I don't have time this morning to give you a BIG lesson on parallel
program design, but perhaps a small one.

Let's pick a nice parallelizable task.  Picking one will give us a good
idea of what it means for a task to be "parallelizable".  It needs to:

  a) Be partitionable.  This just means that the big job of work has to
be something that can be split up into lots of small jobs of work.  Note
that I'm referring to partitioning the WORK, not partitioning the CODE
into functions or subroutines.

  b) The partitioned work needs to be something that can be done
"independently".  By that I mean that if we take the work to be done W
and break it into subsets W_i, the program has to be able to take one or
more steps acting on any of the W_i subsets without requiring additional
information or sequencing from the others.  

The more steps you can take before those subtasks DO need to communicate
or sequence/synchronize, in general, the better, up to running W_i
completely independent copies of a single program with different
arguments ("embarrassingly parallel applications") which is the best of

As I think Mark said, a lovely example is the Mandelbrot set.  This is
an iterated map on the field of complex numbers that either remains
within the set (ambiguously signalled by having a magnitude less than
than 2.0 "forever") or diverges (becomes greater than 2, which is the
formal radius of convergence of the map), depending on the complex point
where the iterated map is started.  It is also a famous example of a
fractal structure, as the boundary of the mandelbrot set has
self-similar structures, the set itself has infinite perimeter but
finite area, and so forth.  If one plots the number of steps
near-boundary points have to take to escape the radius of possible
non-divergence in a color map, one gets some truly lovely pictures that
doubtless you've seen in various books and contexts, and applications
that graph the set and its escaping boundary in this way that permit you
to "rubber-band" areas of the complex plane and "zoom in" on the set
boundaries and interesting regions at least once were in abundance and
came with many linux distros.  For example, there was the "mxp" program
that was in the Red Hat Linux 6.2 "powertools" collection (still
available in src rpm form on archive mirrors if you look for it).

A very simple source code listing for the mandelbrot set (along with
some pictures) can be found here:


If you look at this code you will note a double loop over steps in x
(real part) and y (imaginary part) of an array of complex numbers.  Each
point is iterated in the embedded while loop until either its magnitude
exceeds 2 or a predetermined large maximum number of iterations are
completed (held to be "infinity" for the purpose of determining whether
or not a point escapes).

Note that this iteration is performed for EACH starting point in the array

SO, we have a parallelizable application.  We can partition the work to be
done (all the points in the array) into independent chunks.  Better
still, we can characterize those chunks with only a very few numbers --
the max and min (corner) coordinates and x and y deltas (six numbers)
even though there can be thousands of points to be done within the
boundaries determined by those coordinates.  Each POINT in each chunk
can be done independently, so of course the chunks can be done

Now your "assignment" is fairly simple.  Take this serial code and
parallelize it.  This means that you need a way of splitting the region
to be done into independent chunks.  You then need to figure out how to
use MPI to get independent processors to do each independent chunk, and
then to send the array of iteration counts they end up with back to a
master program for assembly into a single array and output.

As an exercize for the truly motivated, embed the result into the
venerable old mpx program so that it can be displayed and rubberbanded
in X11 in real time.

All of this has been done, of course -- if you look (google) around you
can find both an MPI and a PVM version (xep) of the mandelbrot set
generators, with rubberbanding and so forth. The PVM version uses very
primitive X and is in need of being brought into the 21st century with a
Gtk makeover.  I haven't played as much with the MPI version since I'm
mostly a PVM programmer, when I do parallel library programming at all.

This task is coarse grained enough that you can easily see nearly linear
speedup as you spread the set around more and more processors, at least
if you keep the number of iterations that must be executed to determine
escape up and the x and y deltas down so that each processor has to do a
lot of work compared to the time required to get the initial conditions
and return the evaluated arrays.

There.  Now all I have to do is save this reply and clean it up a bit,
and I'll have a future CWM column all ready to go.  Waste not, want not,
I always say...;-)


> Thanks again everyone,
> -Jack C
> jack {at} crepinc.com
> http://www.crepinc.com
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list