Strange hardware (was Re: custom hardware (was: Xbox clusters?))

Felix Rauch rauch at inf.ethz.ch
Thu Nov 29 02:42:31 PST 2001


On Thu, 29 Nov 2001, Daniel Pfenniger wrote:
> I have seen similar strange behavior of some boxes in a set of 66's,
> and the way to restart is also rather odd.
[...]

We recently had strange problems with a Dell-Box which has been
working without problems for several years in our small research
cluster. It's a dual PII 400 MHz box, but suddenly the Linux kernel
was unable to start the second CPU. It could see the second CPU, but
when it tried to start it up during boot, it got a timeout and so
continued with only one CPU.

So we though that one of the CPUs died and replaced both CPUs. Still
the same problem. Next we replaced the motherboard (including the
power suply). Still the same problem. Maybe the disk corrupted the
kernel, so we installed a fresh version of the same kernel onto the
box. Still the same problem. Only after physically replacing the SCSI
hard disk everything was working properly again.

We are still wondering why a disk could cause a CPU to timeout during
boot...

- Felix
-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307




More information about the Beowulf mailing list