Scyld - slave node boot failure
Andrew Shewmaker
shewa at inel.gov
Mon Jun 25 11:03:53 PDT 2001
I have installed a Scyld Beowulf master node and I am having problems
with the slave nodes.
The addresses pop up as unknown in beosetup, I move an address to the
middle column and
click on apply. The slave nodes fail in the third phase of their boot
up after the bpslave daemon
is started with a message like "short read - lost connection to master".
Then the slave reboots
after waiting 30 seconds.
All of the hardware is identical - slot a Athlons and one network card a
piece, including the
master node. I am using the Scyld prerelease CDs with the update rpms
off of the website.
Here is the content of /var/log/beowulf/node.0
node_up: Setting system clock.
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
node_up: TODO set interface netmask.
node_up: Configuring loopback interface.
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such
file or directory
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking / (type=fs_size=65536)...
setup_fs: Mounting / on /rootfs/ext2... (type=fs_size=65536; options=0)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such
file or directory
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking 134.20.8.76:/home (type=nfs)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting 134.20.8.76:/home on /rootfs//home... (type=nfs;
options=defaults)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking none (type=proc)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting none on /rootfs//proc... (type=proc; options=defaults)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
setup_fs: Checking none (type=devpts)...
bpsh: Node 0 is down. (ignoring)
setup_fs: Mounting none on /rootfs//dev/pts... (type=devpts;
options=gid=5,mode=620)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
beoboot: /lib/modules//modules.dep missing
/usr/lib/beoboot/bin/node_modprobe: /lib/modules//modules.dep: No such
file or directory
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
bpsh: Node 0 is down. (ignoring)
rfork: Invalid argument
Failed to create /etc/mtab.
I have successfully installed both the prerelease and final release on a
different cluster and I
did not see this problem. I did update the master node before I tried
to boot a slave node--
could my difficulties be the result of a botched update? I have tried
booting the slaves with
the prerelease cd as well as a floppy, so I don't think this is a
problem with mismatched
versions.
Thanks for any help,
Andrew Shewmaker
More information about the Beowulf
mailing list