[Beowulf] mpich mpd ring on a network of 2 pcs
Manal Helal
manalorama at gmail.com
Sat Dec 30 08:21:18 PST 2006
Hi
I am trying to setup a small cluster incrementally, to run mpi programs
only. I have 4 PCs with linux fedora core, 2 with core 5, and one with
core 6, and I will install the new one with core 6.
I installed mpich2 on fedora core 6, and I can run mpd and the mpi
programs on this machine fine,
and I can ping and ssh from and to all machines,
then I added an smb share to the install bin path, and can access it
from the other machine, and updated the mpd.hosts file (in the user
folder on the mpich2 installation machine) with the names of both
machines for now, (I copied .mpf.conf to the user folder on both
machines, and same about the mpd.hosts - not sure if this is right or not)
on the second machine, I can read and write the mpich2 bin folder,
and I can run mpd command only
and when I try to mpdtrace, it says no mpd is running,
when I try to run mpd on the installation machine, and can mpdtrace it
and get the port number, and run on the other machine, mpd -h hostname
-p port & I receive:
**********************
[1] 7007
[mhelal at manal mhelal]# manal.localhits_45668: conn error in connect_rhs:
Connection refused
manal.localhits_45668 (connect_rhs 726): failed to connect to rhs at
127.0.0.1 56317
manal.localhits_45668 (enter_ring 633): rhs connect failed
manal.localhits_45668 (run 245): failed to enter ring
**********************
and on the installation machine I keep getting: lot rhs; re-entering
ring ..... back in ring
**********************
another scenario, I tried on the installation machine:
**********************
[mhelal at manallpt ~]$ mpdboot -n 2
mhelal at manal's password:
mpdboot_manallpt.localhits (handle_mpd_output 388): from mpd on manal,
invalid port info:
/home/mhelal: Permission denied.
/home/mhelal/mpich2-install/bin/mpd.py: Command not found.
**********************
how can I debug this problem, any help is highly appreciated, I only
have the mpich2 README and it says refer to the installation guide for
more information, and I can not find that. It would be really helpful if
anyone points me to a tutorial (detailed step by step) on how to create
a small simple network to run mpi jobs, and the things I need to take
care of,
Thank you in advance,
Kind Regards,
Manal
More information about the Beowulf
mailing list