[Beowulf] running MPICH on AMD Opteron Dual Core Processor Cluster( 72 Cpu's)

Vadivelan Rathinasabapathy r.vadivelanrhce at gmail.com
Fri Dec 29 02:26:55 PST 2006


Dear all

 We have a problem of running application that are complied with MPICH. Our
Setup is a 16 Node 72 Cpu  AMD Opteron cluster which has Rocks-4.1.2 and
RHEL 4.0 update 4 installed in it.

     We are trying to run a benchmark with MPICH which came along with the
ROCKS installation. the run starts and then the following error occurs after
sometime.

"  p1_8544: p4_error: Timeout in Establishing connection to remote process:
0  "
rm_l_1_8667: (359.417969) net_send: could not write to fd=5, errno=104

We have been trying the same for the past two days and we didnt get any
solution for the above.

Also we downloaded the Latest MPICH 1.2.7p1 and configured the same. now for
the same testing with the latest mpich, the code seems to be running in the
Master Server no matter, how many number of processors we give.

The same testing with LAM/MPI and OPENMPI are working fine. pls provide us a
good solution
-- 
Thanks and Regards

R.Vadivelan
CMC Ltd,
Bangalore
r.vadivelanrhce at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20061229/0352fead/attachment.html>


More information about the Beowulf mailing list