[Beowulf] PVM on wireless...

Fri Feb 8 09:22:02 PST 2008

  On Fri, Feb 08, 2008 at 11:40:50AM -0500, Robert G. Brown wrote:
  > On Fri, 8 Feb 2008, kohlja at ornl.gov wrote:
  >> Awesome Strangelove Reference...!  :-D
  >>
  >> "I Have A Plan...!"  :-)
  >>
  >> Yep, I am now getting inundated with people having rsh/ssh problems
  >> with PVM, so a higher power clearly wants me to better document this.
  >>
  >> Thanks Much, Will Do...  :)

  > Excellentamundo!

I'm already getting lots of practice explaining how to get this stuff
to work for 3 separate PVM users...  :)

  > At some point at your convenience in the future when
  > you have all kinds of time to metaphorically sit down and REALLY work
  > over PVM...

Ahhh...  Lemme picture that moment...  :-D

  > I have about 800 specific suggestions for bringing it up to
  > current and modern and everything.  Just a wee list.  You know:

  >   * Purge aimk for all time, die die die

Ha ha ha...  You don't like "aimk"...?  :-)

Yeah, PVM was originally pre-autoconf...  Too bad, eh...?  :)

  >   * Actually use the FSH so e.g. apropos pvm works.

I'm assuming you don't mean FSH="Follicle Stimulating Hormone";
did you mean "SSH", or am I clueless...?

Sorry, I guess I'm not "up" on all the latest \/32/\/4[vL/\r...  :-}

  >   * Document the hell out of everything

Yes!  :D

  >   * Rewrite the network back end in a way that openly encourages high
  > end network vendors to contribute reusable non-IP native drivers

Ha ha ha...  Tried to cater to vendors many times.  See all those funny
arch subdirs in pvm3/src...?  Yeah, been there, done that...

(Though I agree that building on top of some generic "standardized"
networking layer would be "nice" - there are so many to choose from... :)

  >   * Add a (possibly macro-driven) middle layer that makes PVM into MPI
  > as well -- one set of actual message-passing functions, two conformally
  > mapped call interfaces.

You mean like "PVMPI"...?

  http://www.netlib.org/utk/papers/pvmpi/paper.html

Or its offspring "MPI-Glue"...?

  http://www.scientific-computing.de/people/rabenseifner/projects/mpi_glue.html

Or do you mean something completely different...?  :)

  >   * Make Ctrl-C work so one can break out of the annoying timeout on add
  > hosts when things don't work.

Yeah, bummer eh?  :)  Where did Bob Manchek go to anyway...?

(He's the real culprit behind the majority of PVM code, btw,
I merely "inherited" the maintenance job... :)

  >   * Make the console capable of cleaning up after a crash or
  > interruption.

We talked about things we could do there, e.g. to clean up old
leftover /tmp/pvmd.* files, etc, but it was always easier to
just remove the files by hand...!  ;)

Good suggestions, though.  I'll add them to my "to do" list,
along with any others that may come up...?  :-)

Thanks, Man!

	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)

  > that kind of thing...;-)

  >    rgb

  >>
  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>
  >>  On Fri, Feb 08, 2008 at 05:35:31AM -0500, Robert G. Brown wrote:
  >>  > On Thu, 7 Feb 2008, kohlja at ornl.gov wrote:
  >>
  >>  >> I admit this may be an antiquated cynical mentality, and I
  >>  >> further concur that PVMNETSOCKPORT is an obvious omission
  >>  >> in the basic documentation/faq...
  >>
  >>  > As they say, you can't RTFM if there ain't no FM... (or if the solution
  >>  > exists but isn't there).
  >>
  >>  > One is reminded of Dr. Strangelove, where the president (Peter Sellers)
  >>  > has just learned that if the maverick B52 piloted by Slim Pickens gets
  >>  > through, a doomsday device that is supposed to deter first nuclear
  >>  > strikes will go off that will destroy the world.  Unfortunately, the
  >>  > Soviet Union didn't actually tell us that it was built.  Dr.
  >>  > Strangelove (Peter Sellers), after musing for a moment on the 
  >> brilliance
  >>  > of the concept, turns and says in an increasingly shrill voice:
  >>
  >>  >   But...the whole point of the Doomsday Machine...is lost...if you keep
  >>  >   it a SECRET. Why didn't you tell the world, eh?
  >>
  >>  > Hmmm...;-)
  >>
  >>  >    rgb
  >>
  >>  >> Thanks for your suggested text!  (And the suggestion to
  >>  >> enhance our coverage of rsh/ssh usage... :-)
  >>
  >>  > Ya, well.  Just now finished telling the umptieth would-be PVM user how
  >>  > to go about it in an email message, augmenting further online docs such
  >>  > as this one:
  >>
  >>  >   http://www.uow.edu.au/~suresh/web/cfamily/pvm.html
  >>
  >>  > which is actually pretty decent, although I generally use the ssh
  >>  > default dsa instead of rsa since on linux boxes it invariably works.
  >>  > But better than forcing each user to employ google to snarf out
  >>  > solutions to each problem they encounter, how much better to write a
  >>  > really nice "Getting Started with PVM" or perhaps better still, a "PVM
  >>  > HOWTO" on tldp.org.  Publish there, and be sure to include a copy in
  >>  > plain sight in /usr/share/pvm3/PVM_HOWTO.
  >>
  >>  > Truthfully, good documentation, especially a walkthrough tutorial on
  >>  > getting started (including sample code or links to sample code) that
  >>  > takes a would-be user from "yum install pvm\*" to executing a Real
  >>  > Parallel Program (however trivial) on a two node cluster would really
  >>  > encourage the use of the library.  Adding a bit more (such as a PVM
  >>  > program development template) would be only icing on the cake, so to
  >>  > speak.
  >>
  >>  > If I had the time I'd write it myself.  I've already got a project_pvm
  >>  > program template up on the web, but it is sadly underdocumented through
  >>  > the setup of PVM itself.
  >>
  >>  >    rgb
  >>
  >>  >>
  >>  >> All the Best,
  >>  >>
  >>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>  >>
  >>  >>  On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote:
  >>  >>  >>  > It would really, really help if man pvm (or man pvmd or man
  >>  >> pvm_intro)
  >>  >>  >>  > documented a suitable firewall setting that will let PVM 
  >> function
  >>  >>  >>  > without just turning off the firewall altogether.  There is no 
  >> pvm
  >>  >>  >> setup
  >>  >>  >>  > in /etc/services, for example, no pvm checkbox in the panels
  >>  >> managed by
  >>  >>  >>  > system-config-firewall in the latest Fedoras, no suggestion as 
  >> to
  >>  >> what
  >>  >>  >>  > what protected port(s) or ranges one has to enable explicitly.  
  >> In
  >>  >> fact
  >>  >>  >>  > for once even google is failing me -- I'm not finding a lot of
  >>  >>  >>  > documentation or remarks by ANYONE on what ports pvm needs open
  >>  >>  >> (besides
  >>  >>  >>  > ssh, which obviously is open and works).  Usually as long as 
  >> the
  >>  >>  >>  > spawning of a network application itself works using an enabled
  >>  >>  >>  > protected port (in this case, I would have expected ssh), the
  >>  >> secondary
  >>  >>  >>  > ports opened in unprotected space just work.  Am I wrong in 
  >> this?
  >>  >> Do I
  >>  >>  >>  > need to explicitly open more ports somewhere?
  >>  >>  >>
  >>  >>  >> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use 
  >> as
  >>  >>  >> many ports as you have machines in your cluster, or could use just 
  >> 1.
  >>  >> :-}
  >>  >>  >>
  >>  >>  >> Normally, the master pvmd creates/accepts connections over a small
  >>  >>  >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a 
  >> PVM
  >>  >>  >> application, then a myriad of direct-connection socket links are
  >>  >>  >> created, to link whichever machines the local PVM application 
  >> tasks
  >>  >>  >> communicate with, on a demand-driven basis...
  >>  >>  >>
  >>  >>  >> So it's not generally possible to specify an explicit "range" of
  >>  >> ports.
  >>  >>  >> However, it _is_ possible to set the "starting" port for this
  >>  >> collection,
  >>  >>  >> using the aforementioned "$PVMNETSOCKPORT" environment variable.
  >>  >>
  >>  >>  > OK, I'm giving this a try.  Although I'd have to ask why pvmd 
  >> doesn't
  >>  >> do
  >>  >>  > the fork thing and clone a single open port on which it listens 
  >> into a
  >>  >>  > dynamically allocated port that inherits from the open one.  In
  >>  >>  > principle one only needs a single port to be open to connect to 
  >> pretty
  >>  >>  > much any network based application, or so I had thought.  At least, 
  >> I
  >>  >> do
  >>  >>  > that in xmlsysd and never have to punch more than one porthole 
  >> through
  >>  >> a
  >>  >>  > firewall.
  >>  >>
  >>  >>  > Hmmm, it's working sort of -- looks like I need to open UPD ports,
  >>  >>  > right, not TCP?  Having trouble on one host where I've punched the 
  >> hole
  >>  >>  > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying 
  >> again
  >>  >>  > with the local environment variable set.
  >>  >>
  >>  >>  > Yup, that works.
  >>  >>
  >>  >>  > So I'm guessing that pvmd reads it as it starts up wherever.  Why 
  >> does
  >>  >>  > it need to do this on a client?  Can't the port(s) be passed from 
  >> the
  >>  >>  > master when it starts up pvmd?
  >>  >>
  >>  >>  >> This sets the first port that PVM will try to use, and all 
  >> subsequent
  >>  >>  >> ports will usually be consecutive positive increments of that 
  >> starting
  >>  >>  >> port (i.e. PVMNETSOCKPORT++... :-).
  >>  >>  >>
  >>  >>  >> So in most cases, you could probably plan on opening up a 100 or 
  >> 1000
  >>  >>  >> ports _somewhere_ in your firewall, depending on your needs, and 
  >> then
  >>  >>  >> just tell PVM where to start, using $PVMNETSOCKPORT...
  >>  >>  >>
  >>  >>  >> I've always considered this solution a bit of a kludge, which is 
  >> why
  >>  >>  >> it doesn't show up in the man pages, but if it works well enough 
  >> to
  >>  >>  >> save people lots of hassle, then I can add some commentary on 
  >> it...?
  >>  >>
  >>  >>  > Kludge or not, how can you have an environment variable in an
  >>  >>  > application and not provide knowledge of it or instructions on its 
  >> use
  >>  >>  > in the man page?  Something like:
  >>  >>
  >>  >>  >  PVM requires open ports on target hosts to function.  Many hosts 
  >> are
  >>  >>  >  installed with strong firewall rules by default.  If you install 
  >> pvm
  >>  >> on
  >>  >>  >  a slave and pvm appears to hang when you attempt to add it, 
  >> eventually
  >>  >>  >  timing out without success, consider adding the following to your
  >>  >> local
  >>  >>  >  personal or system environment (in, for example, ~/.bash_profile 
  >> on
  >>  >> all
  >>  >>  >  hosts):
  >>  >>
  >>  >>  >    PVMNETSOCKPORT=10000
  >>  >>  >    export PVMNETSOCKPORT
  >>  >>
  >>  >>  >  Then configure your firewall(s) to open a range of udp ports 
  >> starting
  >>  >>  >  at this value, such as 10000-11024 (which need be any larger than 
  >> the
  >>  >>  >  largest number of machines you expect to have in your virtual
  >>  >> machine).
  >>  >>
  >>  >>  > However a better solution still is to have the daemon fork on a 
  >> single
  >>  >>  > "permanent" port address > 1024, e.g. 10000, and get a negotiated
  >>  >>  > connection in the upper (non-protected) port space that way.
  >>  >>
  >>  >>  >> It may depend on the firewall settings, but a nice "Connection
  >>  >>  >> Refused" would usually go a long way toward diagnosing things,
  >>  >>  >> whereas the more secure firewall alternative of simply
  >>  >>  >> "no response" would only result in a "timed out" PVM message...
  >>  >>  >>
  >>  >>  >> I'm open to suggestions on ways to identify or diagnose the
  >>  >> problem...!
  >>  >>
  >>  >>  > As I said, document EVERYTHING in the man page(s).  It is what it 
  >> is
  >>  >> for.
  >>  >>  > Lots of users do, in fact, RTFM but get frustrated and give up when
  >>  >> they
  >>  >>  > try something and it just doesn't work and they can't see why.
  >>  >>
  >>  >>  > On the same line, a perennial problem with PVM is getting it to 
  >> work
  >>  >>  > with rsh and ssh.  In fact, half the problems I help people with 
  >> who
  >>  >>  > randomly write me is getting it to work with one or the other.  The
  >>  >>  > internal diagnostics are certainly very helpful, at this point, but 
  >> it
  >>  >>  > would also be worth adding a new man page like pvm_rsh that does
  >>  >> nothing
  >>  >>  > but walk users through the ritual of setting PVM_RSH and 
  >> establishing
  >>  >>  > appropriate e.g. ssh keys.
  >>  >>
  >>  >>  > Just a thought or two.
  >>  >>
  >>  >>  >    rgb
  >>  >>
  >>  >>  >>
  >>  >>  >> Thanks Much for your interest and feedback!
  >>  >>  >>
  >>  >>  >> All the Best,
  >>  >>  >>
  >>  >>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>  >>  >>
  >>  >>  >>  > I actually help a lot of people get started with PVM (they 
  >> write me
  >>  >>  >>  > offline because I have a template PVM tarball up on my personal
  >>  >>  >> website)
  >>  >>  >>  > and the more I know, the better I can help them...;-)
  >>  >>  >>
  >>  >>  >>  >    rgb
  >>  >>  >>
  >>  >>  >>  > --
  >>  >>  >>  > Robert G. Brown                            Phone(cell):
  >>  >> 1-919-280-8443
  >>  >>  >>  > Duke University Physics Dept, Box 90305
  >>  >>  >>  > Durham, N.C. 27708-0305
  >>  >>  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  >>  >>  > Book of Lilith Website:
  >>  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>  >>  >>
  >>  >>  >>
  >>  >> 
  >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
  >>  >>  >>
  >>  >>  >>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  
  >> They
  >>  >>  >>   Oak Ridge National Laboratory              still owe you money,
  >>  >> Fool!"
  >>  >>  >>   kohlja at ornl.gov
  >>  >>  >>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis 
  >> Blues!!!
  >>  >>  >>
  >>  >>  >>
  >>  >> 
  >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
  >>  >>  >>
  >>  >>
  >>  >>  > --
  >>  >>  > Robert G. Brown                            Phone(cell): 
  >> 1-919-280-8443
  >>  >>  > Duke University Physics Dept, Box 90305
  >>  >>  > Durham, N.C. 27708-0305
  >>  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  >>  > Book of Lilith Website: 
  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>  >>
  >>
  >>  > --
  >>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  >>  > Duke University Physics Dept, Box 90305
  >>  > Durham, N.C. 27708-0305
  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>
  >> _______________________________________________
  >> Beowulf mailing list, Beowulf at beowulf.org
  >> To change your subscription (digest mode or unsubscribe) visit 
  >> http://www.beowulf.org/mailman/listinfo/beowulf
  >>

  > -- 
  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  > Duke University Physics Dept, Box 90305
  > Durham, N.C. 27708-0305
  > Web: http://www.phy.duke.edu/~rgb
  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977