[Beowulf] PXEBoot struggling
Duke Nguyen
duke.lists at gmx.com
Mon Nov 19 00:43:48 PST 2012
Hi folks,
So per advices and suggestions, we started to look for booting our nodes
throught Gbit Ethernet. The OS of our choice is Scientific Linux 6.3 -
SL6.3 (for all master and client nodes). There are bunches of
guides/instructions out there in the net, but I focused and learnt from
mainly two guides:
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/diskless-nfs-config.html
http://www.linuxquestions.org/questions/red-hat-31/building-a-diskless-redhat-enterprise-linux-cluster-765393/
After few days struggling with the system, here are what I have done:
* install SL6.3 on master node
* install DHCP server (using dhcpd) on master node
* install xinetd and enable tftp
* open firewall for tftp and dhcpd using iptables
The aboves were enough for me to boot up SL6.3 LiveCD on a client node
using PXE. The liveCD boots fine, I was able to get into the desktop,
but was unable to proceed next :(. Cant install because these are
diskless nodes.
What I have done next:
* install/enable nfs server
* open firewall (iptables) for nfs services
Then booting SL6.3 LiveCD, i still cannot see nfs mount point to install
the system. Then next trial was rsync. First rsync was for the current
system on master node (with a lot of different services such as dhpcd,
nfs, xinetd, tftp)
$ rsync -a -e ssh --exclude='/proc/*' --exclude='/sys/*' localhost:/
/diskless/hostroot
where hostroot is exported through nfs server:
$ cat /etc/exports
/diskless *(rw,sync,no_root_squash)
After editing /diskless/hostroot/etc/fstab as instructed:
$ cat /diskless/hostroot/etc/fstab
none /tmp tmpfs defaults 0 0
none /dev/shm tmpfs defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
Finally I have in tftp server:
$ ls -l /var/lib/tftpboot/
total 781140
-rw-r--r--. 1 root root 32149978 Nov 16 17:07
initramfs-2.6.32-279.14.1.el6.x86_64.img
-rw-r--r--. 1 root root 730839030 Nov 14 16:22 initrd0.img
-rw-r--r--. 1 root root 26828 Nov 14 16:22 pxelinux.0
drwxr-xr-x. 2 root root 4096 Nov 19 14:40 pxelinux.cfg
-r--r--r--. 1 root root 3987376 Nov 14 16:22 vmlinuz0
-rwxr-xr-x. 1 root root 3989680 Nov 15 23:22
vmlinuz-2.6.32-279.14.1.el6.x86_64
Ok, booting this system, I was able to see desktop client on the node,
but can't log in (actually, I was able to log in and was kicked out
right after that). ssh to the client node got the same thing: in and
being kicked out. Dont know what was wrong :(.
OK, next I tried not to rsync the current master system, but tried to
install using groupinstall:
$ yum -y groupinstall "Base" "Server Platform" --installroot=/diskless/root
but then I got a bunch of errors with dependencies. Asking SL
forum/mailing list with the above errors but I have not gotten any good
solution yet.
So finally I tried to put a USB stick on the client node, booted up
LiveCD again, installed the new system on client node on the usb stick,
and then rsync using this system instead of the master node's sytem:
$ rsync -a -e ssh --exclude='/proc/*' --exclude='/sys/*' 192.168.200.2:/
/diskless/clientroot
Unfortunately this system could not boot up. I got stuck at something like
INFQ: task flush-0:18:1924 blocked for more than 120 seconds.
So to summarize:
* boot using liveCD -> OK, logging in fine
* boot using rsync of master node's system -> OK, cant log in
* boot using rsync of client node's sytem -> cant boot
* install client node using groupinstall -> cant do
So, what should I do next? Please advise,
Thanks,
D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20121119/b4db9dd6/attachment.html>
More information about the Beowulf
mailing list