[Beowulf] mpich mpd ring on a network of 2 pcs

Manal Helal manalorama at gmail.com
Sat Dec 30 08:21:18 PST 2006


I am trying to setup a small cluster incrementally, to run mpi programs 
only. I have 4 PCs with linux fedora core, 2 with core 5, and one with 
core 6, and I will install the new one with core 6.

I installed mpich2 on fedora core 6, and I can run mpd and the mpi 
programs on this machine fine,
and I can ping and ssh from and to all machines,

then I added an smb share to the install bin path,  and can access it 
from the other machine, and updated the mpd.hosts file (in the user 
folder on the mpich2 installation machine) with the names of both 
machines for now, (I copied .mpf.conf to the user folder on both 
machines, and same about the mpd.hosts - not sure if this is right or not)

on the second machine, I can read and write the mpich2 bin folder,
and I can run mpd command  only
and when I try to mpdtrace, it says no mpd is running,

when I try to run mpd on the installation machine, and can mpdtrace it 
and get the port number, and run on the other machine, mpd -h hostname 
-p port & I receive:
[1] 7007
[mhelal at manal mhelal]# manal.localhits_45668: conn error in connect_rhs: 
Connection refused
manal.localhits_45668 (connect_rhs 726): failed to connect to rhs at 56317
manal.localhits_45668 (enter_ring 633): rhs connect failed
manal.localhits_45668 (run 245): failed to enter ring
 and on the installation machine I keep getting: lot rhs; re-entering 
ring ..... back in ring

another scenario, I tried on the installation machine:
[mhelal at manallpt ~]$ mpdboot -n 2
mhelal at manal's password:
mpdboot_manallpt.localhits (handle_mpd_output 388): from mpd on manal, 
invalid port info:
/home/mhelal: Permission denied.
/home/mhelal/mpich2-install/bin/mpd.py: Command not found.

how can I debug this problem, any help is highly appreciated, I only 
have the mpich2 README and it says refer to the installation guide for 
more information, and I can not find that. It would be really helpful if 
anyone points me to a tutorial (detailed step by step) on how to create 
a small simple network to run mpi jobs, and the things I need to take 
care of,

Thank you in advance,

Kind Regards,


More information about the Beowulf mailing list