[Beowulf] Re: Geriatric computer does not stay up

David Mathog mathog at caltech.edu
Thu Dec 17 10:32:15 PST 2009

Gus Correa <gus at ldeo.columbia.edu> wrote

> Some of the built-in 3Com Ethernet 100 interfaces on
> Tyan S2466[-4M] motherboards we have here became flaky/failed
> after many years of use.
> Those are main boards in in several standalone workstations/PCs.
> I don't administer those systems, but I believe the symptoms
> were somewhat random, as those you describe.
> Disabling the onboard Ethernet (by jumper), and replacing them by
> PCI Ethernet 100 cards, gave those systems additional lifetime.
> Would this be the case of your cluster node?

Tyan S2466 MPX does not seem to have such a jumper.  Possibly it can be
disabled in the BIOS. Oddly, the system is fine PXE booting over that
interface, but every attempt at:

  service network start

hangs instantly.  Tried booting with a serial console like this from

LABEL serial
  KERNEL vmlinuz-
  APPEND initrd=initrd- root=/dev/hda3 failsafe

which uses the initrd and vmlinuz downloaded from the server, and the
disk from the iffy machine for the programs.  That booted fine, but the
kernel emitted nothing on the serial line when the machine hung.

Running smartctl now, after that will boot a rescue linux and see if
that too has network issues.  

Ran memory tests for over 20 hours without a single hiccup.

I'll keep looking.  Thanks all.

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

