Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

MPICH-1.2.2.3 Problem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Gabriel J. Weinstock gabriel.weinstock at dnamerican.com
Wed Oct 24 12:12:21 PDT 2001


  I'm trying to get MPICH 1.2.2.3 running on a 4 node cluster of PIII 1 GHz 
machines. the tstmachines program runs without error and the rsh mechanism is 
set up and functioning properly. LAM-MPI works out of the box, so we decided 
to use that for awhile, but we're going to need a production environment and 
MPICH seemed more suitable.
  Anyway, I compile the example `cpi.c' program, and do `mpirun -v -np 4 
cpi'. Nothing happens for a few minutes, then I get a flurry of `Connection 
failed for reason: : Connection timed out' messages, followed by

p1_10899: p4_error: Timeout in establishing connection to remote process: 0
p3_15707: p4_error: net_recv read: probable EOF on socket: 1
bm_list_4303: (378.120857) Listener: Unable to interrupt client pid=4302.

  We had a similar problem about 2 months ago which led us to abandon this 
implementation. There seem to be a number of people having this problem, but 
no one, and I mean no one, seems to know the answer. Any help would be 
greatly appreciated.
Thanks,
Gabe




More information about the Beowulf mailing list