[Beowulf] RE: Compare and contrast MPI implementations

Robert G. Brown rgb at phy.duke.edu
Mon Dec 19 06:48:15 PST 2005

On Mon, 19 Dec 2005, Leif Nixon wrote:

> And an irritating problem, at that, because you often have to fight a
> huge, complex program written by an author that has a real problem
> understanding the concept of a cluster that *isn't* purpose-built to
> run a single frobotz version and nothing else.
> [For the sake of brevity, I here omit 300 lines of ranting]
> So, getting back to my original point, modules *don't* help
> David with his problem.

Oh, I would probably agree with all the ranting (and will kindly omit
even MORE lines of it -- and you know that I can, he says
ominously...:-).  However, the fundamental problem isn't just the
authors of programs or lack of a "better" form of support for bizarre
edge case software, it is that well, life can be pretty complex.
Cluster life being no exception to the rule, and general purpose cluster
life being way MUCH the rule.  That's one reason Duke's CPS department
has people working on COD and other such tools -- the need for program
provisioning goes beyond just MPI flavors and libraries these days to
entire boot environments (not all of which are necessarily going to be
even linux, let alone a single FLAVOR of linux).

Their solution is to say screw it.  DEVELOP a special purpose cluster
image, and instead of using a batch processor to allocate booted CPUs,
we'll use a higher level, more coarse grained task processor to allocate
UNbooted CPUs and let you boot them into your own custom environment,
per task.  Which is not, actually, completely crazy.  In fact, with
wulfware in hand, it is probably really, really doable and might even
scale decently depending on how many of those nasty edge-case tasks you
had to support.

Now mind you, I'd have a penguin tatooed on my left buttock cheek
without anesthetic before I willingly supported any scheme with a
nightmarish degree of heterogeneity professionally with anything less
than huge supporting cast of ultra-skilled minions.  I would probably
take to drinking heavily -- make that heavilier -- even then.  To avoid
divorce court ("why are you drunk and where did you get the penguin
tatooed on your butt" NOT being a dicussion I want to have with my wife
anytime soon:-) I'd expend really, really considerable energy trying to
convince the potential users of a cluster, funders of a cluster,
designers of a cluster, supporters of a cluster, to use a bit of sense
in laying out the "rules" for the cluster (a.k.a. "policies", a.k.a.
"what we will support").  Those rules need to make it possible to Just
Say No Way in Hell Will That Happen Here to would-be users, at least
would be users who want you to HELP them resolve problems like that.  Or
(with a COD-like solution) You Can Have Nodes But We Won't Help With The
Image, Support, Operation (beyond providing you with this documentation
and these templates and that prototyping mini-cluster over there, have
at it and try not to break anything not that we care 'cuz I'm going out
bar-hopping with my little black-and-white friend here...).

I will now really REALLY try not to start a rant of my own, especially
one that indicts e.g. cernlib, horrendous software builds, programs that
need a library only available in RH 6.2 (sorry), Microsoft products in
general, especially their "cluster", or flightless birds with a fondness
for anchovies portrayed in a tasteful art medium on smooth skin...


Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list