Slave node problem

Donald Becker becker at scyld.com
Wed Aug 22 10:26:47 PDT 2001


On Tue, 21 Aug 2001 ericf at whispers.org wrote:

> On Tue, 21 Aug 2001, Sean Dilda wrote:
> > On Mon, 20 Aug 2001, ericf at whispers.org wrote:
> > > Hi, I just recently installed the Scyld Beowulf software (27bz-7)
> > > and have ran into some difficulty with slave nodes.
> > > apply.  The slave nodes appear to get an IP address then ...
> > > the machine reboots.
> > What happened after the machine rebooted?
..
> Well, When the machine reboots it completely reboots (as if i hit the
> reset switch or just turned it on).  It then proceeds to boot back off the
> floppy and go through the boot process, grabs its IP after the RARP and
> said neighbor table overflow and spits out some more info about taking
> down interfaces and reboots.

The likely problem is a misconfigured network adapter.
What is the detection message for the NIC?

If the node cannot communicate with the master while booting, it will
reboot in 30 seconds.  This allows recovery from some types of transient
errors.


If the node reboots immediately _after_ downloading a new kernel, this
could be a (rare) problem with Two Kernel Monte and your specific
BIOS/motherboard.
What motherboard are you using, and what BIOS does it have?

The documented work-around for a T-K-M failure is to write a stage 2
kernel to the floppy.  You can make the stage two floppy image by doing
  beoboot -2 -f

You may need to trim the list of modules in /etc/beowulf/config.boot to
fit on a floppy.

The downside of this approach is that the floppy is now tied to your
kernel version.  You cannot update the kernel on the master without
updating all of the boot images.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993





More information about the Beowulf mailing list