[Beowulf] PVM on wireless...
reuti at staff.uni-marburg.de
Wed Feb 6 11:52:09 PST 2008
Am 06.02.2008 um 19:21 schrieb Robert G. Brown:
> On Wed, 6 Feb 2008, Bill Rankin wrote:
>> Hey Rob,
>> Could it be a node naming issue where the wireless IP does not
>> resolve to the same address as that used in the machinefile? I
>> seem to recall a similar issue back when we PVM on machines with
>> multiple network connections.
> pvmd is actually starting up on the target machine -- it works that
> The master node IP number is correct, as is the slave IP number (both
> visible as arguments to pvmd). The name I'm using is the one
> with the wireless interface in question, both machines ping in all
> directions by name with the correct internet address. All my machines
> are configured more or less identically, use the same environment
> variables, support transparent ssh command execution (which obviously
> works even in PVM as the daemon is being spawned on the correct
> The wireless interfaces have the right MTU and look exactly like the
> ethernet devices they in fact are to the kernel AFAIK. In every other
> aspect I've ever tested, including my own homemade socket code,
> to both tcp and udp daemons, ability to mount NFS, support ssh, and so
> on and so forth, they behave like TCP/IP sockets over ethernet devices
> as far as systems calls go -- they use the same interface, and the
> point of OSI/ISO is that code should not depend on the hardware layer
> and in general on even a roughly posix compliant machine using
> devices and e.g. the socket API it doesn't.
> Last time I encountered this, I actually cranked up the -d0x0 stuff
> "watched" as the system went through to where it hung in the middle of
> doing some part of the post-spawn handshaking.
Just an idea to check: PVM can also be started without rsh/ssh
between the machines. You have to copy and paste some things from
here to there and back and can startup all daemons this way by hand
(page 30 in the PVM book). Maybe this works - just to narrow the cause.
> I suspect a race condition, probably caused by using raw UDP with some
> assumption of latency during the handshake. The one way I can
> think of
> that the two connections differ is in their latency -- even the
> bandwidth of wireless is every bit as great as 10B2 networks I've run
> PVM on in years past (on proportionally slower CPUs, of course).
> If the
> master or slave send out an acknowledgement packet either before the
> window where the other can receive it or after it has grown bored and
> stopped listening, it might fail to properly bind or something. It
> seems like it would be a bug, not a feature, but if I were feeling
> infinitely masochistic and were to wander down into Other People's
> Source (ouch!) to try to debug this, that's what I'd look for first.
> Any PVM developers still on list? Any comments from them?
>> Just a thought,
>> On Feb 6, 2008, at 10:40 AM, Robert G. Brown wrote:
>>> Anybody on list have any idea why PVM fails to add hosts over a
>>> link? I've now tried this over multiple distro version and at
>>> least one
>>> PVM update, and it just doesn't work. Works fine over a wire,
>>> fails on
>>> wireless, and as far as I know wire and wireless are both
>>> at the kernel interface layer so that any e.g. socket one might
>>> open is
>>> absolutely ecumenical about what the underlying hardware is (good
>>> ISO/OSI layering, right?).
> Robert G. Brown Phone(cell): 1-919-280-8443
> Duke University Physics Dept, Box 90305
> Durham, N.C. 27708-0305
> Web: http://www.phy.duke.edu/~rgb
> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf