[Beowulf] MPI vs PVM

Tue Jan 4 10:01:04 PST 2005

On Tue, 4 Jan 2005, Mark Hahn wrote:

> > You might also consider "Why are PVM and MPI so Different?"  by Bill
> > Gropp and me in the proceedings of the 4th European PVM/MPI Users' Group
> > Meeting, Springer Lecture Notes in Computer Science 1332, 1997,
> > pp. 3--10.  In that paper we tried to focus on the sources of the
> > implementation-independent semantic differences.
> 
> that paper and other really good stuff is here:
> http://www-unix.mcs.anl.gov/mpi/papers/archive/

While we're on it, there is also:

  http://www.csm.ornl.gov/pvm/PVMvsMPI.ps

from back in 1996.  The features of the two haven't really changed since
this comparison was written, EXCEPT that vendor support in the form of
non-TCP hardware drivers for the major low latency interconnects has
perhaps been directed somewhat more at MPI than at PVM although one
could argue that this is also a decision of the PVM maintainers.

Still, even this isn't all that big a differentiator.  There is PVM-GM
(for myrinet).  SCI (Dolphinics) can also apparently be embedded into
PVM, where TCP is used to set up an SCI-based memory map to perform the
actual communication. 

This is described in a nice (and fairly current) white paper on the
dolphinics site that reiterates more of the same stuff that I've been
saying.  That is, PVM is generally preferred by Old Guys Like Me (who
started with PVM), people who run on a heterogeneous cluster, and
"programmers who prefer its dynamic features".  MPI is preferred by
people who want the best possible performance (maybe -- I haven't seen a
lot of data to support this but am willing to believe it) especially on
a homogeneous cluster, people who have code they want to be able to port
to a "real supercomputer" (which generally run MPI as noted in the
document above which describes their mutual history but which probably
WON'T run PVM), and people who started out with IT.

As in, whatever you take the time to learn first will likely become your
favorite -- both work pretty well and have the essential features
required to write efficient message-passing parallel code.  At one time
there was even talk of merging the two; its sort of a shame that this
never was pursued.

In fact, here is a lovely project for some bright CPS student (or
professor, looking for a class assignment) who might be reading this
list.  Write (or have your students write) a PVM "emulator" on top of
MPI or better, an MPI emulator on top of PVM.  Or get REALLY ambitious,
and separate the interconnect layer, the message passing layer, and the
actual communications calls (wrappers) so that a single tool provides
either the PVM API or MPI API.

Speaking for myself, I'd like the merged tool to have a PVM-like control
interface for managing the "virtual cluster" -- it is one of the things
that I think is one of those "dynamic" features people prefer,
especially when one learns to use the built in diagnostics.  I'd like to
have MPI's back-end communications stack, including all the native
drivers for low-latency high-bandwidth interconnects.  I'd like things
like broadcast and multicast (one to many, many to many) communications
to be transparently and EFFICIENTLY implemented -- to really exploit the
hardware features and/or topology of the cluster without needing to
really understand exactly how they work.  I'd like a new set of
"ps"-like commands for monitoring cluster tasks running under the PVMPI
tool, so one doesn't have to run a bproc-like kernelspace tool to be
able to simply monitor task execution.  I'd like the tool to be
>>secure<< and not a bleeding wound on any LAN or WAN virtual cluster,
whether or not it is firewalled.  And I'd REALLY like a very few
features to be added to support making parallel tasks robust -- not
necessarily checkpointing, but perhaps a "select"-like facility that can
take certain actions if a node dies, including automagically re-adding
it (post reboot, for example) and restarting a task on it

After all, message passing is message passing.  One establishes a
"socket" between tasks running on different hosts.  One writes to the
socket, reads from the socket (where the socket may be "virtual" through
e.g. SCI mapped memory).  One maintains a table of state information.
Everything else is fancy frills and clever calls.  Raw sockets (or
mapped memory) through MPI or PVM or PVMPI is just a matter of how one
wraps all this up and hides it behind an API.

Alas, after a bit of preliminary work back in 1996, e.g.:

  http://www.netlib.org/utk/papers/pvmpi/paper.html

Fagg and Dongarra seem to have let this particular project slip.  A
shame...

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu