[Beowulf] Building a 2 node cluster using mpich
Kalpana Kanthasamy
kalpana0611 at gmail.com
Thu Dec 27 09:33:49 PST 2007
Hi guys, I am a beginner in linux and also for cluster, but I really
need to experiment this for my project. Anyway I have documented what
I have done so far, but I got stuck after a certain point... Let me
explain what I have done
After searching through the internet for a few days, I decided to use
http://blizzard.rwic.und.edu/~nordlie/deuce/
http://www.mcsr.olemiss.edu/bookshelf/articles/how_to_build_a_cluster.html
1.Installed a Linux distribution (I am using Open Suse on each
computer in both computers in the cluster).
2.During the installation process, assign hostnames and of course,
unique IP addresses for each node in your cluster, gateway is the
router. Hostname – localhost, domain - localdomain
3.Cluster is private. I have used IP address 192.168.0.190 for the
master node and 192.168.0.191 for the slave node.
4.Finally, create identical user accounts on each node. In our case,
we create the user DevArticle on each node in our cluster. You can
either create the identical user accounts during installation, or you
can use the adduser command as root.
Configuration on all nodes
On all nodes
5.We now need to configure rsh on each node in our cluster. Create
.rhosts files in the user and root directories. Our .rhosts files for
the DevArticle users are as follows:
Master DevArticle
Slave DevArticle
The .rhosts files for root users are as follows:
Master root
Slave root
On all nodes
6.Next, I modified the etc/hosts.equiv file, the same thing both in
Master and Slave
192.168.0.190 Master.localhost.localdomain Master
127.0.0.1 localhost
192.168.0.191 Slave.localhost.localdomain Slave
7.Do not remove the 127.0.0.1 localhost line. The hosts.allow files on
each node was modified by adding ALL+ as the only line in the file.
This allows anyone on any node permission to connect to any other node
in our private cluster.
On all nodes
8.To allow root users to use rsh, I had to add the following lines to
the /etc/securetty file:
rsh
rlogin
rexec
pts/0
pts/1
On all nodes
9.Also, I modified the /etc/pam.d/rsh file:
#%PAM-1.0
# For root login to succeed here with pam_securetty, "rsh" must be
# listed in /etc/securetty.
auth sufficient /lib/security/pam_nologin.so
auth optional /lib/security/pam_securetty.so
auth sufficient /lib/security/pam_env.so
auth sufficient /lib/security/pam_rhosts_auth.so
account sufficient /lib/security/pam_stack.so service=system-auth
session sufficient /lib/security/pam_stack.so service=system-auth
On all nodes
Rsh, rlogin, Telnet and rexec are disabled by default. To change this,
I navigated to the /etc/xinetd.d directory and modified each of the
command files (rsh, rlogin, telnet and rexec), changing the disabled =
yes line to disabled = no.
Once the changes were made to each file (and saved), I closed the
editor and issued the following command:
Turn on the rsh daemon using the chkconfig command: chkconfig rsh on
1.To check the rsh daemon's status, run the chkconfig command:
chkconfig --list rsh
2.Run the /etc/rc.d/xinetd restart command.
3.Restart xinetd with /sbin/service xinetd restart
The Mounting Process
On the Master node
I edited the etc/exports
This is how my file is, I used the YAST – NFS server tool.I then
double checked my etc/exports file, this is how it looks
/home 192.168.1.190/255.255.255.0(rw,no_root_squash)
/usr/local 192.168.1.190/255.255.255.0(rw,no_root_squash)
On the Slave node
I edited the etc/fstab
This is how my file is, I used the YAST – NFS client tool.I then
double checked my etc/fstab file, this is how it looks
----------------------------------------------------------------------------------------------------------
/dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part6 / ext3 acl,user_xattr
1 1
/dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part1 /windows/C ntfs-3g users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8
0 0
/dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part5 swap swap defaults
0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
usbfs /proc/bus/usb usbfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0
/dev/fd0 /media/floppy auto noauto,user,sync 0 0
Master:/home /home nfs rw 0 0
Master:/usr/local /usr/local nfs ro 0 0
I also changed this etc/mtab file, according to the mpich documentation
----------------------------------------------------------------------------------------------------------
/dev/sda5 / ext3 rw,acl,user_xattr 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
debugfs /sys/kernel/debug debugfs rw 0 0
udev /dev tmpfs rw 0 0
devpts /dev/pts devpts rw,mode=0620,gid=5 0 0
/dev/sda1 /windows/C fuseblk
rw,noexec,nosuid,nodev,noatime,allow_other,default_permissions,blksize=4096
0 0
securityfs /sys/kernel/security securityfs rw 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
Master:/home /rmt/Master/home nfs noac 0 0
Master:/usr/local /rmt/Master/usr/local nfs noac 0 0 0
-----------------------------------------------------------------------------------------------------------
After that I did this
On each node, type ifconfig and make sure that the machine has its
appropriate interior IP address. (Such as 192.168.0.X).
On each node, go to /etc/rc.d and type ./network stop.
On the master node, also type ../nfs stop
On the master node, type ../nfs start On each node, type ../network start.
I guess I mounted properly rite, cause I made sure I followed the
websites..I could access the files from the slave machines also
I could ping both machines, and if I type
Master:/ # rsh Slave
Master:/ # ls -a
or
Slave:/ # rsh Master
Slave:/ # ls -a
works on both the machine, and then when I type ls -a, I get to see
the files, but its when I type a full command like this, it fails, and
permission denied appears. I emptied my host. allow and host. deny
files in both Master and Slave.
But when I type commands like
Master:/ # rsh Slave date
Master:/ # permission denied
or
Master:/ # rsh Master pwd
Master:/ # permission denied
Ok, here is where I am stuck, cause I tried installing mpich but
during both rsh and ssh were not detected during configuration,
permission denied, I think its something to with my NFS, any idea
guys....
More information about the Beowulf
mailing list