[Beowulf] PXEBoot struggling

Duke Nguyen duke.lists at gmx.com
Mon Nov 19 00:43:48 PST 2012

Hi folks,

So per advices and suggestions, we started to look for booting our nodes 
throught Gbit Ethernet. The OS of our choice is Scientific Linux 6.3 - 
SL6.3 (for all master and client nodes). There are bunches of 
guides/instructions out there in the net, but I focused and learnt from 
mainly two guides:


After few days struggling with the system, here are what I have done:
  * install SL6.3 on master node
  * install DHCP server (using dhcpd) on master node
  * install xinetd and enable tftp
  * open firewall for tftp and dhcpd using iptables

The aboves were enough for me to boot up SL6.3 LiveCD on a client node 
using PXE. The liveCD boots fine, I was able to get into the desktop, 
but was unable to proceed next :(. Cant install because these are 
diskless nodes.

What I have done next:
  * install/enable nfs server
  * open firewall (iptables) for nfs services

Then booting SL6.3 LiveCD, i still cannot see nfs mount point to install 
the system. Then next trial was rsync. First rsync was for the current 
system on master node (with a lot of different services such as dhpcd, 
nfs, xinetd, tftp)

$ rsync -a -e ssh --exclude='/proc/*' --exclude='/sys/*' localhost:/ 

where hostroot is exported through nfs server:

$ cat /etc/exports
/diskless *(rw,sync,no_root_squash)

After editing /diskless/hostroot/etc/fstab as instructed:

$ cat /diskless/hostroot/etc/fstab
none            /tmp            tmpfs    defaults    0 0
none                   /dev/shm                tmpfs defaults        0 0
none                  /dev/pts                devpts gid=5,mode=620  0 0
sysfs                   /sys                    sysfs defaults        0 0
proc                    /proc                   proc defaults        0 0

Finally I have in tftp server:

$ ls -l /var/lib/tftpboot/
total 781140
-rw-r--r--. 1 root root  32149978 Nov 16 17:07 
-rw-r--r--. 1 root root 730839030 Nov 14 16:22 initrd0.img
-rw-r--r--. 1 root root     26828 Nov 14 16:22 pxelinux.0
drwxr-xr-x. 2 root root      4096 Nov 19 14:40 pxelinux.cfg
-r--r--r--. 1 root root   3987376 Nov 14 16:22 vmlinuz0
-rwxr-xr-x. 1 root root   3989680 Nov 15 23:22 

Ok, booting this system, I was able to see desktop client on the node, 
but can't log in (actually, I was able to log in and was kicked out 
right after that). ssh to the client node got the same thing: in and 
being kicked out. Dont know what was wrong :(.

OK, next I tried not to rsync the current master system, but tried to 
install using groupinstall:

$ yum -y groupinstall "Base" "Server Platform" --installroot=/diskless/root

but then I got a bunch of errors with dependencies. Asking SL 
forum/mailing list with the above errors but I have not gotten any good 
solution yet.

So finally I tried to put a USB stick on the client node, booted up 
LiveCD again, installed the new system on client node on the usb stick, 
and then rsync using this system instead of the master node's sytem:

$ rsync -a -e ssh --exclude='/proc/*' --exclude='/sys/*' 

Unfortunately this system could not boot up. I got stuck at something like

INFQ: task flush-0:18:1924 blocked for more than 120 seconds.

So to summarize:
  * boot using liveCD -> OK, logging in fine
  * boot using rsync of master node's system -> OK, cant log in
  * boot using rsync of client node's sytem -> cant boot
  * install client node using groupinstall -> cant do

So, what should I do next? Please advise,


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20121119/b4db9dd6/attachment.html>

More information about the Beowulf mailing list