[Beowulf] Re: Node Boot Problem ~ No Keyboard/Harddrive/Diskdrive

Phil Smith pjgsmith at gmail.com
Fri Jul 9 07:16:27 PDT 2004

I've done further testing and have yet to resolve this problem.

The node is booting perfectly fine via PXE. I have a very minimal
version of Debian being loaded into Ramdisk. However, I have also
tried using a larger version of RedHat loaded into Ramdisk and simply
using a nfs root filesystem. Each of these setups still suffer the
following consequence:

If none of the following devices are detected: keyboard, usb mass
storage device, harddrive....then the node crashes within 15 minutes.
When it crashes, the screen simply goes blank and the light on the NIC
hard goes out.

If either a keyboard, usb mass storage device or harddrive are
detected (or more then one are detected) then the node stays up for
about 24 hours (but does eventually crash in the same fashion).

I do believe that this crash is due to some sort of hardware
incompatibility, however I have 64 of these identical nodes, so
replacing the hardware is not an option.

I'm currently puzzled as to what may be causing this problem. Possibly
some sort of glitch in some power-save code? I've disabled all the
power-save options in the kernel and still experience this problem.

Any help would be greatly appreciated,

 Phil Smith

On Wed, 23 Jun 2004 11:12:03 -0400, Phil Smith <pjgsmith at gmail.com> wrote:
> Hello all,
> I am currently trying to configure a Beowulf cluster and I'm having a
> strange problem. The nodes all use PXE and TFTP to acquire the kernel
> from the server and then commence booting.
> If the node has a keyboard/hard-drive/floppy-disk-drive plugged into
> it, then the system boots perfectly. However, if the node has none of
> these devices plugged in, then it crashes (screen goes blank, nic
> light goes out). When exactly the node crashes is not consistent,
> however it always occurs after the kernel has been transfered and
> before the login screen appears.
> I have tried debugging the problem with no success.
> I first thought that the node was trying to log the 'no keyboard
> error' to a local disk, for some reason. But ruled that out when I saw
> the problem still occurred when a floppy-drive was present (with no
> floppy in the drive).
> I have already tried several different kernels with various settings
> (v2.4.20-31.9, v2.4.24, v2.2.26) and the exact same problem remains. I
> have stepped through the startup file and found nothing which should
> cause such an issue.
> The nodes are P4 2.5 Ghz, 512 mb RAM, Intel D845GERG2 motherboards.
> There are 64 of them and all of them are diskless. I would rather not
> try to resolve this issue by 'buying 64 keyboards'.
> Any help/suggestions on resolving this issue would be greatly appreciated.
> Cheers,
> -Phil Smith

