[bproc]MPI chokes

Jag agrajag at linuxpower.org
Thu Mar 15 08:10:41 PST 2001


On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:

> > Based on the error messages from your previous message, it looks like it
> > is trying to rfork to a node that is down.  What does the output of
> > 'bpstat' on your cluster look like?
> > 
> > 
> > Jag
> 
> Here is the output from bpstat
> 
> jarrett/home/edwardsa>bpstat
> Node    Address         Status
> 0       192.168.1.100   up
> 1       192.168.1.101   up
> 2       192.168.1.102   up
> 3       192.168.1.103   up
> 4       192.168.1.104   up
> 5       192.168.1.105   up
> 6       192.168.1.106   up
> 7       192.168.1.107   down
> 8       192.168.1.108   down
> 9       192.168.1.109   down

<snip>

Ok.. You seem to be running Scyld's PREVIEW release (27BZ-6).  At the
end of January, Scyld had an actual release (27BZ-7).  The 27BZ-7
release included updated software, including updates for the beompi,
which is Scyld's MPI package.

I never tried to run MPI programs on the preview release, but my guess
is that it is getting confused by all the "down" nodes.  I've played
with MPI on the 27BZ-7 release and have had no problems when there were
down nodes.  So, I would recommend to you that you upgrade to the latest
release.

Also, the reason you have so many "down" nodes is that you gave it a
large IP range to use for slave nodes.  If you want there to be not as
many "down" nodes (that are really nodes that just don't exist), you
should use the beosetup program, click on preferences, and adjust the IP
range so that there are as many IPs as there are slave nodes.

Hope this helps,


Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010315/7a148393/attachment.sig>


More information about the Beowulf mailing list