[Beowulf] Building a 2 node cluster using mpich

Kalpana Kanthasamy kalpana0611 at gmail.com
Thu Dec 27 09:33:49 PST 2007


Hi guys, I am a beginner in linux and also for cluster, but I really
need to experiment this for my project. Anyway I have documented what
I have done so far, but I got stuck after a certain point... Let me
explain what I have done

After searching through the internet for a few days, I decided to use

http://blizzard.rwic.und.edu/~nordlie/deuce/
http://www.mcsr.olemiss.edu/bookshelf/articles/how_to_build_a_cluster.html

1.Installed a Linux distribution (I am using Open Suse on each
computer in both computers in the cluster).

2.During the installation process, assign hostnames and of course,
unique IP addresses for each node in your cluster, gateway is the
router. Hostname – localhost, domain - localdomain


3.Cluster is private. I have used IP address 192.168.0.190 for the
master node and 192.168.0.191 for the slave node.

4.Finally, create identical user accounts on each node. In our case,
we create the user DevArticle on each node in our cluster. You can
either create the identical user accounts during installation, or you
can use the adduser command as root.


Configuration on all nodes

On all nodes
5.We now need to configure rsh on each node in our cluster. Create
.rhosts files in the user and root directories. Our .rhosts files for
the DevArticle users are as follows:
Master DevArticle
Slave DevArticle

The .rhosts files for root users are as follows:

Master root
Slave root



On all nodes
6.Next, I modified the etc/hosts.equiv file, the same thing both in
Master and Slave

192.168.0.190 Master.localhost.localdomain Master
127.0.0.1          localhost
192.168.0.191  Slave.localhost.localdomain Slave


7.Do not remove the 127.0.0.1 localhost line. The hosts.allow files on
each node was modified by adding ALL+ as the only line in the file.
This allows anyone on any node permission to connect to any other node
in our private cluster.



On all nodes
8.To allow root users to use rsh, I had to add the following lines to
the /etc/securetty file:

rsh
rlogin
rexec

pts/0
pts/1



On all nodes
9.Also, I modified the /etc/pam.d/rsh file:
#%PAM-1.0
# For root login to succeed here with pam_securetty, "rsh" must be
# listed in /etc/securetty.
auth       sufficient   /lib/security/pam_nologin.so
auth       optional     /lib/security/pam_securetty.so
auth       sufficient   /lib/security/pam_env.so
auth       sufficient   /lib/security/pam_rhosts_auth.so
account  sufficient   /lib/security/pam_stack.so service=system-auth
session   sufficient   /lib/security/pam_stack.so service=system-auth

On all nodes
Rsh, rlogin, Telnet and rexec are disabled by default. To change this,
I navigated to the /etc/xinetd.d directory and modified each of the
command files (rsh, rlogin, telnet and rexec), changing the disabled =
yes line to disabled = no.

Once the changes were made to each file (and saved), I closed the
editor and issued the following command:

Turn on the rsh daemon using the chkconfig command: chkconfig rsh on
1.To check the rsh daemon's status, run the chkconfig command:
chkconfig --list rsh
2.Run the /etc/rc.d/xinetd restart command.
3.Restart xinetd with /sbin/service xinetd restart



The Mounting Process

On the Master node
I edited the etc/exports

This is how my file is, I used the YAST – NFS server tool.I then
double checked my etc/exports file, this is how it looks

/home		192.168.1.190/255.255.255.0(rw,no_root_squash)
/usr/local	192.168.1.190/255.255.255.0(rw,no_root_squash)


On the Slave node
I edited the etc/fstab

This is how my file is, I used the YAST – NFS client tool.I then
double checked my etc/fstab file, this is how it looks
----------------------------------------------------------------------------------------------------------
/dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part6	/	ext3	acl,user_xattr
1 1
/dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part1	/windows/C	ntfs-3g	users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8
0 0
/dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part5	swap	swap	defaults
0 0
proc	/proc	proc	defaults 0 0
sysfs	/sys	sysfs	noauto 0 0
debugfs	/sys/kernel/debug	debugfs	noauto 0 0
usbfs	/proc/bus/usb	usbfs	noauto 0 0
devpts	/dev/pts	devpts	mode=0620,gid=5 0 0
/dev/fd0	/media/floppy	auto	noauto,user,sync 0 0
Master:/home	/home	nfs	rw 0 0
Master:/usr/local	/usr/local	nfs	ro 0 0




I also changed this etc/mtab file, according to the mpich documentation

----------------------------------------------------------------------------------------------------------
/dev/sda5 / ext3 rw,acl,user_xattr 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
debugfs /sys/kernel/debug debugfs rw 0 0
udev /dev tmpfs rw 0 0
devpts /dev/pts devpts rw,mode=0620,gid=5 0 0
/dev/sda1 /windows/C fuseblk
rw,noexec,nosuid,nodev,noatime,allow_other,default_permissions,blksize=4096
0 0
securityfs /sys/kernel/security securityfs rw 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0

Master:/home /rmt/Master/home nfs noac 0 0

Master:/usr/local /rmt/Master/usr/local nfs noac 0 0 0
-----------------------------------------------------------------------------------------------------------

After that I did this

On each node, type ifconfig and make sure that the machine has its
appropriate interior IP address. (Such as 192.168.0.X).
On each node, go to /etc/rc.d and type ./network stop.
On the master node, also type ../nfs stop
On the master node, type ../nfs start On each node, type ../network start.


I guess I mounted properly rite, cause I made sure I followed the
websites..I could access the files from the slave machines also


 I could ping both machines, and if I type
Master:/ # rsh Slave
Master:/ # ls -a
or
Slave:/ # rsh Master
Slave:/ # ls -a


works on both the machine, and then when I type ls -a, I get to see
the files, but its when I type a full command like this, it fails, and
permission denied appears. I emptied my host. allow and host. deny
files in both Master and Slave.



But when I type commands like
Master:/ # rsh Slave date
Master:/ # permission denied

or

Master:/ # rsh Master pwd
Master:/ # permission denied

Ok, here is where I am stuck, cause I tried installing mpich but
during both rsh and ssh were not detected during configuration,
permission denied, I think its something to with my NFS, any idea
guys....




More information about the Beowulf mailing list