[Beowulf] Monkey Business

Robert G. Brown rgb at phy.duke.edu
Tue Mar 14 11:24:52 PST 2006

On Tue, 14 Mar 2006, Douglas Eadline wrote:

>> PVM?  What is this,  1995?
> Prepare yourself. You may have awakened a certain prodigious PVM
> practitioner who can write cogent arguments faster then most people
> can read them.

Naaa, not worth it.  MPI vs PVM is an argument that is a decade or more
old all by itself, and to my own experience it is USUALLY decided by a
person's age and personal history, not so much on one's fundamental
virtues relative to the other.

To understand the argument at all, one needs to understand the
individual history of the two message passing libraries.  MPI was
invented by a consortium of supercomputer manufacturers and users in
response to the demand by the US government (fed up with having to pay
to port a very expensive code base to each successive generation of
supercomputer) that a portable interface be developed to hide
proprietary detail, and that comparative performance analysis would only
be done using code built from this portable base.  It is the product of
a consortium, and has ALWAYS had real money and a lot of competing
interests associated with it

PVM was invented as a >>computer science research project<< to do
>>heterogeneous<< (network based) distributed computing back in 1989 or
thereabouts by Vaidy Sunderam and Al Geist, joined later by Bob Manchek
and Jack Dongarra (and by now a cast of thousands).  It was one of the
archetypical open source projects, and I would personally argue that the
"beowulf" -- by definition, remember, a parallel supercomputer built out
of COTS parts -- was invented at this moment, by these guys, for all
that it took another six or seven years for an open source operating
system to come into existence that could broaden the term "COTS" so that
it referred to off-the-shelf MASS MARKET hardware and not just
collections of workstations and big iron supercomputers (commodities,
sure, but fairly expensive ones:-).  In the meantime, there were plenty
of folks who were using tens to hundreds of Unix workstations in
parallel back when linux didn't even exist yet.

For this reason, people who have been more strongly into the
heterogeneous distributed computing model (as I have, from late 1992 on)
tend to prefer PVM as it basically provides a comfortable environment
for wrapping up a lot of socket code and program management code they'd
otherwise have to write themselves, but which still preserves the "feel"
of that low level message passing code.  They associate PVM pretty
intimately with the whole CONCEPT of distributed computing, including
but not limited to beowulfery.

People who come from the big iron side of things who have migrated into
the beowulf world similarly tend to prefer MPI, and often had a large
code base of MPI code to start from.  Then there is a chunk of people
who have started doing cluster computing SINCE the early beowulf days
with its channel bonded ethernet and TCP/IP based communications, who
have more or less grown up with MPI supporting high end IPC hardware
with low latency, native drivers while PVM, for the most part, has not.
PVM has also languished somewhat from a lack of love -- Jack Dongarra
seems to be the only one who still advances the project via input from
students, and even with this support it has been a long time since any
sort of substantive advance has been made in PVM itself.

I tend to view the issue this way.  If you are planning to do a
distributed computing project on TCP/IP over ethernet (any speed),
especially in a heterogeneous environment, especially a computation with
anything like a master-slave flavor, PVM is nearly perfect.  It is also
great for learning about parallel programming when you have no
particular project in mind and are not inclined to invest in IPC
hardware that likely costs as much or more than your nodes themselves,
per node.

It is easy to obtain and install (prebuilt for nearly any distribution
within the distribution).  It is trivial to program.  It gives you a
couple of nifty operational consoles -- xpvm and the tty pvm monitor --
that can help you visualize the program's operation at moderate scales,
let you directly control the virtual machine it runs on, and more.  It
is stable and relatively bug free, in part because it is "old code" and
its networking support is fairly straightforward and standardized.  Its
message passing calls are easy to understand in terms of the underlying
data structures involved.

Using PVM is fairly easy to write programs that CAN be tuned by hand or
dynamically to a heterogeneous environment.  In fact, PVM can actually
run a single program on multiple architectures -- one of my first
exposures to it was one of Vaidy's presentations in which he showed
scaling results for a computation that was being run in parallel across
a Cray, a cluster of Sun workstation, a cluster of DEC workstations
(this WAS 1992 and DEC still existed:-), and a cluster of I think HPs or
AIX boxes, cannot remember.  One computation, four or five distinct
binaries, ethernet for all IPCs.  Tres cool.

Even today, I'm not at all certain that a version of MPI exists that
>>can<< do this.  Sure, with Linux nearly ubiquitous in production
cluster environments there is less incentive than there was a decade
plus ago, but even now PVM "could" be used to run a single computation
across e.g. i386 and x64 architectures, using native binaries on both
(not i386 compatibility binaries and libraries on both).  PVM also gives
you fairly straightforward control over just how the job distributes
itself, permitting you (with some effort) to invoke multiple instances
of a job per node, respawn a worker task on a crashed node, and so on.

Here is where I wish that more progress had been made with the tool
itself -- it isn't as easy to do this as it should be, as there aren't
enough low level calls within the library to make it easy to dynamically
detect events like node crashes, nor are there good ways to detect and
respond to events like the node coming back up or a new node being added
to an old running computation.  Another feature I've longed for is a set
of calls to make it EASY to create a non-blocking front end for a
master-slave sort of task, to facilitate creating a UI or GUI for a
running parallel program.  Doing this from scratch is high art, not at
all easy or straightforward.

Again, I'm not sure that MPI does any better at these sorts of thing.
All of them can be done, but they require a way above average knowledge
of coding to do them -- minimally things like fork/exec with pipes or
sockets or threads and memory based IPCs to do the front end, a front
end that can invoke the task spawner or a USR signal processor or the
like to manage the task respawning problem, and there just plain is no
easy way to monitor node status to automate this and most of the
solutions I've seen (or written) are ugly one-offs.  I actually think
using e.g. libwulf and xmlsysd (the one I've written that is NOT a
one-off) would make part of this more doable than anything that is
readily available at the systems or PVM level.

To conclude, it is perfectly reasonable for an MPI-raised coder, or a
coder who needs the low latency of the high end interconnects, to wonder
why PVM still exists.  My good buddies at the Monkey who focus on MPI or
e.g. work on OpenMPI or MPI-2 would also point out that for the last six
or seven years, at least, there has been a LOT of energy expended on
making Open Source MPI work really well in a very sophisticated way for
beowulf-style supercomputing clusters in particular, where for the
decade before that there was the distinct feeling that supporting this
kind of architecture was more or less of an afterthought for the MPI

It is EQUALLY easy for a PVM-raised coder to wonder why so much energy
has been expended adapting a tool to working over a network that was
originally designed by a committee intent on optimally supporting some
lowest-common-denominator picture of parallel message-based IPCs on
hardware that might or might not actually BE message passing hardware
behind the interface.  Such a coder might also speak of Dongarra, Geist,
Sunderam and the rest in a hushed tone of respect and suspect that PVM
hides a bunch of "real computer science", however dated, behind its
simple facade of systems calls.  However old and faded you might think
it is, PVM was once brilliant and inspiring and to many of us it retains
its old luster.

Most of this was laid out in a lovely white paper "PVM and MPI: A
comparison of features" by Geist, Kohl, and Papadopolous from back in
1996 and still available on www.netlib.org/pvm3, Dongarra's primary
package drop.

> Besides, PVM  is alive an well in 2006
> http://www.pvmmpi06.org/

Because fundamentally, PVM is still pretty damn easy to install, code up
from a template and use, especially for master-slave type programs or to
write relatively simple ethernet based demos.  And because all us Old
Guys haven't died off yet...;-)

In fact, if you look at the list of speakers giving invited talks atthis
conference, hey, there are good old Vaidy Sunderam and Al Geist, still
going strong.



> --
> Doug
>> On Mon, 13 Mar 2006, Douglas Eadline wrote:
>>> Be the first to read about the new Tyan desk side system in this weeks
>>> news
>>> wrap-up. And while you are there check out the latest MPI Joy articles,
>>> the
>>> LCI meeting coming in May, Robert Brown on PVM and more.
>>>  http://www.clustermonkey.net
>>> Primates only.
>>> --
>>> Doug
>  >>
>>> PS And don't forget to hop over to the Cluster Agenda and add your two
>>> cents (http://agenda.clustermonkey.net)
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list