[Beowulf] PVM on wireless...

Thu Feb 7 10:53:04 PST 2008

Hi Robert/Rob/RGB!  :-)

  On Thu, Feb 07, 2008 at 12:55:31PM -0500, Robert G. Brown wrote:
  > On Wed, 6 Feb 2008, kohlja at ornl.gov wrote:
  >> Hey Gang!
  >> Sounds like you're having some "fun" with PVM over wireless...?  :-)
  >> (A buddy (Wael Elwasif) forwarded your discussion to me;
  >> please always feel free to copy "pvm at msr.csm.ornl.gov"
  >> with PVM inquiries when you get stuck.  I try to be
  >> pretty responsive, though this is all unfunded work now... :)

  > Bless you.

De nada, you're welcome.  :-)

  > However, I've just manage to figure the problem out on my own.  It is,
  > after all, a firewall issue... <snip/>

Ah, Good!  Glad that's all it was, not that it wasn't a hassle to identify! :)

Sorry it was so non-obvious from the PVM side of things...  :-b

  > While I've got the One True PVM Human(s) on the line, though...

Mwuahahahahahaaaa...  :-)

  > -- a suggestion for PVM to help others avoid this problem in the future
  > on networks wired and wireless:

  > It would really, really help if man pvm (or man pvmd or man pvm_intro)
  > documented a suitable firewall setting that will let PVM function
  > without just turning off the firewall altogether.  There is no pvm setup
  > in /etc/services, for example, no pvm checkbox in the panels managed by
  > system-config-firewall in the latest Fedoras, no suggestion as to what
  > what protected port(s) or ranges one has to enable explicitly.  In fact
  > for once even google is failing me -- I'm not finding a lot of
  > documentation or remarks by ANYONE on what ports pvm needs open (besides
  > ssh, which obviously is open and works).  Usually as long as the
  > spawning of a network application itself works using an enabled
  > protected port (in this case, I would have expected ssh), the secondary
  > ports opened in unprotected space just work.  Am I wrong in this?  Do I
  > need to explicitly open more ports somewhere?

Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use as
many ports as you have machines in your cluster, or could use just 1.  :-}

Normally, the master pvmd creates/accepts connections over a small
set of ports, possibly 1, but if PvmRouteDirect is enabled in a PVM
application, then a myriad of direct-connection socket links are
created, to link whichever machines the local PVM application tasks
communicate with, on a demand-driven basis...

So it's not generally possible to specify an explicit "range" of ports.
However, it _is_ possible to set the "starting" port for this collection,
using the aforementioned "$PVMNETSOCKPORT" environment variable.

This sets the first port that PVM will try to use, and all subsequent
ports will usually be consecutive positive increments of that starting
port (i.e. PVMNETSOCKPORT++... :-).

So in most cases, you could probably plan on opening up a 100 or 1000
ports _somewhere_ in your firewall, depending on your needs, and then
just tell PVM where to start, using $PVMNETSOCKPORT...

I've always considered this solution a bit of a kludge, which is why
it doesn't show up in the man pages, but if it works well enough to
save people lots of hassle, then I can add some commentary on it...?

  > To find out, this leaves me with running e.g. tcpdump and watching as
  > pvm attempts to connect, opening port ranges one at a time and doing a
  > binary search, or something similarly painful.  Or just asking you.  So
  > what (minimal set of) ports do I need to leave open besides ssh, which
  > is always open on my systems anyway?

  > An additional suggestion would be to (if possible) have the RPM install
  > "fix" the port situation so that pvm shows up on system-config-firewall
  > and/or finish with a message to the installer that a particular firewall
  > setting must be installed or enabled and/or add something to the
  > debugging info provided by pvm so that on a timeout (in particular) it
  > prints something like "Unable to connect due to timeout.  Verify that
  > pvm is correctly installed and that port range xxxx-xxxx is open on the
  > target."

You _should_ be getting some sort of timeout message in the slave
pvmd's log file (/tmp/pvml.<uid> on the slave machine), when the
connection request to the master pvmd doesn't get a reply...?

It may depend on the firewall settings, but a nice "Connection
Refused" would usually go a long way toward diagnosing things,
whereas the more secure firewall alternative of simply
"no response" would only result in a "timed out" PVM message...

I'm open to suggestions on ways to identify or diagnose the problem...!

Thanks Much for your interest and feedback!

All the Best,

	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)

  > I actually help a lot of people get started with PVM (they write me
  > offline because I have a template PVM tarball up on my personal website)
  > and the more I know, the better I can help them...;-)

  >    rgb

  > -- 
  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  > Duke University Physics Dept, Box 90305
  > Durham, N.C. 27708-0305
  > Web: http://www.phy.duke.edu/~rgb
  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977

(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:

   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
   Oak Ridge National Laboratory              still owe you money, Fool!"
   kohlja at ornl.gov
   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!

:):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)