Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Scyld - slave node boot failure

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Andrew Shewmaker shewa at inel.gov
Mon Jun 25 11:03:53 PDT 2001


I have installed a Scyld Beowulf master node and I am having problems 
with the slave nodes.
The addresses pop up as unknown in beosetup, I move an address to the 
middle column and
click on apply.  The slave nodes fail in the third phase of their boot 
up after the bpslave daemon
is started with a message like "short read - lost connection to master". 
 Then the slave reboots
after waiting 30 seconds.

All of the hardware is identical - slot a Athlons and one network card a 
piece, including the
master node.  I am using the Scyld prerelease CDs with the update rpms 
off of the website.

Here is the content of /var/log/beowulf/node.0

node_up: Setting system clock.
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
node_up: TODO set interface netmask.
node_up: Configuring loopback interface.
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking / (type=fs_size=65536)...
setup_fs: Mounting / on /rootfs/ext2... (type=fs_size=65536; options=0)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking 134.20.8.76:/home (type=nfs)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting 134.20.8.76:/home on /rootfs//home... (type=nfs; 
options=defaults)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking none (type=proc)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting none on /rootfs//proc... (type=proc; options=defaults)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking none (type=devpts)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting none on /rootfs//dev/pts... (type=devpts; 
options=gid=5,mode=620)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such 
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
rfork: Invalid argument
Failed to create /etc/mtab.


I have successfully installed both the prerelease and final release on a 
different cluster and I
did not see this problem.  I did update the master node before I tried 
to boot a slave node--
could my difficulties be the result of a botched update?  I have tried 
booting the slaves with
the prerelease cd as well as a floppy, so I don't think this is a 
problem with mismatched
versions.

Thanks for any help,

Andrew Shewmaker





More information about the Beowulf mailing list