trouble starting pvm:urgent

Robert G. Brown rgb at phy.duke.edu
Tue May 7 11:48:17 PDT 2002


On Tue, 7 May 2002, ahmad rizvi wrote:

> Hi all:
> I am running pvm3.4 on solaris machines. When I try to add a machine from the pvm prompt it says:
> 
> pvm> add mos
> mos: Can't start pvmd
> 
> I try to start the pvm daemon from the command line manually using rsh:
> > rsh mos $PVM_ROOT/lib/pvmd -s -d8 -nhonk 1 08:00:00:9f:88:aa 4096 3 08:00:00:9f:88:aa
> [pvmd pid8816] 05/07 15:14:47 version 3.4.0
> [pvmd pid8816] 05/07 15:14:47 ddpro 2316 tdpro 1318
> [pvmd pid8816] 05/07 15:14:47 main() debug mask is 0x8 (slv)
> [pvmd pid8816] 05/07 15:14:47 mksocs() bind netsock: Cannot assign requested address
> [pvmd pid8816] 05/07 15:14:47 pvmbailout(0)
> [pvmd pid8816] 05/07 15:14:47 sending FIN|ACK to all pvmds
> 
> I think there is a socket error as it says: mksocs() bind netsock: ......
> Is this becuase of permissions on the network?

It's hard to say without more data.  The error message itself generally
means that either you're trying to bind to a port in the range usually
reserved for root without permission to do so (<1024) or you're trying
to bind to a port that already has something listening.

The first message is pretty generic.  pvm 3.4.3 (which I'm running at
the moment to test) provides a lot more useful debugging trace on
failure.  For example (I had to create an account that I knew would
fail):

rgb at lilith|T:104>su
Password: 
rgb at lilith|T:1#adduser dummy
rgb at lilith|T:2#su - dummy
[dummy at lilith dummy]$ pvm
pvm> add lucifer
add lucifer
0 successful
                    HOST     DTID
                 lucifer Can't start pvmd

Auto-Diagnosing Failed Hosts...
lucifer...
Verifying Local Path to "rsh"...
Rsh found in /usr/bin/rsh - O.K.
Testing Rsh/Rhosts Access to Host "lucifer"...

Rsh/Rhosts Access FAILED - "lucifer.rgb.private.net: Connection refused"
Make sure host lucifer is up and connected to
a network and check its DNS / IP address.
Also verify that lilith is allowed
rsh access on lucifer
Add this line to the $HOME/.rhosts on lucifer:
lilith dummy

which is a LOT more useful than the message you got.  A more common
source of failure is to try to start a pvmd on a host that already has
one running.  This would likely produce the bind error you describe, as
a new pvmd cannot bind to the usual socket -- something is already there
(another pvmd).  This traces out to:

0 successful
                    HOST     DTID
                 lucifer Can't start pvmd

Auto-Diagnosing Failed Hosts...
lucifer...
Verifying Local Path to "rsh"...
Rsh found in /usr/bin/ssh - O.K.
Testing Rsh/Rhosts Access to Host "lucifer"...
Rsh/Rhosts Access is O.K.
Checking O.S. Type (Unix test) on Host "lucifer"...
Host lucifer is Unix-based.
Checking $PVM_ROOT on Host "lucifer"...
$PVM_ROOT on lucifer Appears O.K. ("/usr/share/pvm3")
Verifying Location of PVM Daemon Script on Host "lucifer"...
PVM Daemon Script Found ("/usr/share/pvm3/lib/pvmd")
Determining PVM Architecture on Host "lucifer"...
$PVM_ARCH not set on lucifer
Manually Determining PVM Architecture on Host "lucifer"...
$PVM_ARCH for lucifer is LINUX.
Verifying Existence of PVM Daemon Executable on Host "lucifer"...
PVM Daemon Executable Found ("/usr/share/pvm3/lib/LINUX/pvmd3")
Determining PVM Temporary Directory on Host "lucifer"...
$PVM_TMP not set on lucifer
Assuming /tmp.
Checking for Leftover PVM Daemon Files on Host "lucifer"...

PVM Daemon Files Found on lucifer!
Either PVM is Already Running or else it
crashed and left behind a /tmp/pvmd.<uid>
daemon file.
Halt PVM if it is running on lucifer, or else
remove any leftover /tmp/pvmd.* files.

...which also shows pretty much how you can manually diagnose your
problem.

I'd suggest FIRST upgrading to pvm 3.4.3 (or the latest stable release),
doing a new build for your systems if necessary.  For linux systems you
should be able to just use current RPM's -- my pvm came straight out of
standard Red Hat 7.2, for example.  Read the new documentation -- in
addition to auto-diagnosing most failures AND telling you how to fix
them (awesome concept!) there are a variety of new environment variables
(such as PVM_RSH, PVM_TMP) that can be used to configure intelligent
behavior in certain environments.

Hope this helps...

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list