Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Building a 2 node cluster using mpich

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Reuti reuti at staff.uni-marburg.de
Sun Dec 30 14:33:27 PST 2007


Hi,

Am 27.12.2007 um 18:33 schrieb Kalpana Kanthasamy:

> Hi guys, I am a beginner in linux and also for cluster, but I really
> need to experiment this for my project. Anyway I have documented what
> I have done so far, but I got stuck after a certain point... Let me
> explain what I have done
>
> After searching through the internet for a few days, I decided to use
>
> http://blizzard.rwic.und.edu/~nordlie/deuce/
> http://www.mcsr.olemiss.edu/bookshelf/articles/ 
> how_to_build_a_cluster.html
>
> 1.Installed a Linux distribution (I am using Open Suse on each
> computer in both computers in the cluster).
>
> 2.During the installation process, assign hostnames and of course,
> unique IP addresses for each node in your cluster, gateway is the
> router. Hostname – localhost, domain - localdomain
>
>
> 3.Cluster is private. I have used IP address 192.168.0.190 for the
> master node and 192.168.0.191 for the slave node.
>
> 4.Finally, create identical user accounts on each node. In our case,
> we create the user DevArticle on each node in our cluster. You can
> either create the identical user accounts during installation, or you
> can use the adduser command as root.

better use NIS (or LDAP). So you only have to define the users once.

>
>
> Configuration on all nodes
>
> On all nodes
> 5.We now need to configure rsh on each node in our cluster. Create
> .rhosts files in the user and root directories. Our .rhosts files for
> the DevArticle users are as follows:
> Master DevArticle
> Slave DevArticle
>
> The .rhosts files for root users are as follows:
>
> Master root
> Slave root
>
>
>
> On all nodes
> 6.Next, I modified the etc/hosts.equiv file, the same thing both in
> Master and Slave
>
> 192.168.0.190 Master.localhost.localdomain Master
> 127.0.0.1          localhost
> 192.168.0.191  Slave.localhost.localdomain Slave

There is only the hostname to put there, hence only two lines:

Master
Slave

>
> 7.Do not remove the 127.0.0.1 localhost line. The hosts.allow files on
> each node was modified by adding ALL+ as the only line in the file.
> This allows anyone on any node permission to connect to any other node
> in our private cluster.
>
>
>
> On all nodes
> 8.To allow root users to use rsh, I had to add the following lines to
> the /etc/securetty file:
>
> rsh
> rlogin
> rexec
>
> pts/0
> pts/1
>
>
> On all nodes
> 9.Also, I modified the /etc/pam.d/rsh file:
> #%PAM-1.0
> # For root login to succeed here with pam_securetty, "rsh" must be
> # listed in /etc/securetty.
> auth       sufficient   /lib/security/pam_nologin.so
> auth       optional     /lib/security/pam_securetty.so

You can try to comment-out the line above.

> auth       sufficient   /lib/security/pam_env.so
> auth       sufficient   /lib/security/pam_rhosts_auth.so
> account  sufficient   /lib/security/pam_stack.so service=system-auth
> session   sufficient   /lib/security/pam_stack.so service=system-auth
>
> On all nodes
> Rsh, rlogin, Telnet and rexec are disabled by default. To change this,
> I navigated to the /etc/xinetd.d directory and modified each of the
> command files (rsh, rlogin, telnet and rexec), changing the disabled =
> yes line to disabled = no.
>
> Once the changes were made to each file (and saved), I closed the
> editor and issued the following command:
>
> Turn on the rsh daemon using the chkconfig command: chkconfig rsh on
> 1.To check the rsh daemon's status, run the chkconfig command:
> chkconfig --list rsh
> 2.Run the /etc/rc.d/xinetd restart command.
> 3.Restart xinetd with /sbin/service xinetd restart
>
>
>
> The Mounting Process
>
> On the Master node
> I edited the etc/exports
>
> This is how my file is, I used the YAST – NFS server tool.I then
> double checked my etc/exports file, this is how it looks
>
> /home		192.168.1.190/255.255.255.0(rw,no_root_squash)
> /usr/local	192.168.1.190/255.255.255.0(rw,no_root_squash)
>
>
> On the Slave node
> I edited the etc/fstab
>
> This is how my file is, I used the YAST – NFS client tool.I then
> double checked my etc/fstab file, this is how it looks
> ---------------------------------------------------------------------- 
> ------------------------------------
> /dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part6	/	 
> ext3	acl,user_xattr
> 1 1
> /dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part1	/ 
> windows/C	ntfs-3g	 
> users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8
> 0 0
> /dev/disk/by-id/scsi-SATA_WDC_WD800VE-00H_WD-WXEZ06F66679-part5	 
> swap	swap	defaults
> 0 0
> proc	/proc	proc	defaults 0 0
> sysfs	/sys	sysfs	noauto 0 0
> debugfs	/sys/kernel/debug	debugfs	noauto 0 0
> usbfs	/proc/bus/usb	usbfs	noauto 0 0
> devpts	/dev/pts	devpts	mode=0620,gid=5 0 0
> /dev/fd0	/media/floppy	auto	noauto,user,sync 0 0
> Master:/home	/home	nfs	rw 0 0
> Master:/usr/local	/usr/local	nfs	ro 0 0
>
>
>
>
> I also changed this etc/mtab file, according to the mpich  
> documentation

I would never change the /etc/mtab by hand, as it's maintained by the  
kernel. Where is this stated in the mpich documentation to touch it?

> ---------------------------------------------------------------------- 
> ------------------------------------
> /dev/sda5 / ext3 rw,acl,user_xattr 0 0
> proc /proc proc rw 0 0
> sysfs /sys sysfs rw 0 0
> debugfs /sys/kernel/debug debugfs rw 0 0
> udev /dev tmpfs rw 0 0
> devpts /dev/pts devpts rw,mode=0620,gid=5 0 0
> /dev/sda1 /windows/C fuseblk
> rw,noexec,nosuid,nodev,noatime,allow_other,default_permissions,blksize 
> =4096
> 0 0
> securityfs /sys/kernel/security securityfs rw 0 0
> nfsd /proc/fs/nfsd nfsd rw 0 0
> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
>
> Master:/home /rmt/Master/home nfs noac 0 0
>
> Master:/usr/local /rmt/Master/usr/local nfs noac 0 0 0
> ---------------------------------------------------------------------- 
> -------------------------------------
>
> After that I did this
>
> On each node, type ifconfig and make sure that the machine has its
> appropriate interior IP address. (Such as 192.168.0.X).
> On each node, go to /etc/rc.d and type ./network stop.
> On the master node, also type ../nfs stop
> On the master node, type ../nfs start On each node, type ../network  
> start.
>
>
> I guess I mounted properly rite, cause I made sure I followed the
> websites..I could access the files from the slave machines also
>
>
>  I could ping both machines, and if I type
> Master:/ # rsh Slave
> Master:/ # ls -a
> or
> Slave:/ # rsh Master
> Slave:/ # ls -a
>
>
> works on both the machine, and then when I type ls -a, I get to see
> the files, but its when I type a full command like this, it fails, and
> permission denied appears. I emptied my host. allow and host. deny
> files in both Master and Slave.
>
>
>
> But when I type commands like
> Master:/ # rsh Slave date
> Master:/ # permission denied
>
> or
>
> Master:/ # rsh Master pwd
> Master:/ # permission denied
>
> Ok, here is where I am stuck, cause I tried installing mpich but
> during both rsh and ssh were not detected during configuration,
> permission denied, I think its something to with my NFS, any idea.

There is no need to allow it for root at all. Is it working for a  
normal user? Then you can already run parallel programs.

-- Reuti



More information about the Beowulf mailing list