Need help setting up MPI on a cluster

Ron Choy cly at MIT.EDU
Thu Feb 28 13:35:42 PST 2002

(These are two posts that I made to mpi-bugs at and 
comp.parallel.mpi  I got no reply, so I am trying my luck here ...)
(There are two questions, first is about setting up ch_p4mpd, second is 
about serv_p4 in ch_p4.  Solving either one of them is good enough for me - 
nolocal starts really slow right now!)

I am running MPICH 1.2.3 on a cluster of 9 nodes, each with 2 Athlon
MP.   I installed mpich with ch_p4mpd on the frontend, and copied the
binaries over the the compute nodes.  The configure options I used are

--with-device=ch_p4mpd --prefix=/usr/local/mpich-mpd -rsh=ssh

Then I set up the mpd ring by running mpd on the frontend, and then
mpd -h frontend-0 -p <the port I got> -b

on each compute node.

Tests I tried:
tstmachine works fine.
mpdringsize gives me 9 (correct)
mpdringtest works
mpdtrace gives out something sensible. (the nodes form a ring)
A hello world type program runs fine (with net_recv errors at the end).
The program involves no MPI_Send and MPI_Recv

But when I try the cpi program in examples, I get
[cly at frontend-0 cly]$ mpirun -np 2 ./cpi
Process 0 on frontend-0
Process 1 on compute-0-7
p1_26310: (2.019748) net_recv failed for fd = 12
p1_26310:  p4_error: net_recv read, errno = : 111

This happens for any program that involves Send and Recv (Send, Recv,
Bcast .. etc never completes).  Any insights?  Anything I did wrong in
the setup?


Failing on ch_p4mpd  (see my previous email), I am trying to do serv_p4
on ch_p4.  But I am running into this problem.

[root at frontend-0 sbin]# ./chp4_servs
starting /usr/local/mpich/bin/serv_p4 on compute-0-0 with 1234

[cly at frontend-0 sbin]$ ./chkserv
Bad message from compute-0-0: :Password

I configured with -rsh=ssh, and my system is setup so that ssh requires
no password (mpirun works perfectly without serv_p4).

I tried searching on the web but I cant find any useful information.

And I cant see to find any documentation on serv_p4, on its option, and
why it's asking for a password (serv_p4 password on google doesnt yield
anything sensible).

Any ideas?

More information about the Beowulf mailing list