Strange hardware (was Re: custom hardware (was: Xbox clusters?))

Mark at Mark at
Thu Nov 29 05:50:58 PST 2001

Yep, seen this problem many times in our computer hire range of
Windows2000Pro machines.  The strange thing is that we only see this on Slot
1 Pentium II machines with various model motherboards.  All our Pentium III
range are socket 370 and no problems.  So we came to a feeling that the
problem was the way in which the Slot1 Pentium II sits on the motherboard.
After months of clients returning equipment to base under warranty, we
issued instruction on how to open the case and remove and re seat the
PentiumII Slot 1 processor package.  The machines then boot every time after
switch on. 
How many of you having this problem have it with the slot 1 Pentium II and
slot 2 Pentium III processors in your clusters?  I bet none of you have it
with a socket 370 or other "flat" socket type of CPU package.  We're
fortunate that our development cluster is based on Pentium 233MHz MMX "old"
ex-hire equipment so we don't have this problem on the cluster.  Yet! 
-----Original Message----- 
From: Felix Rauch 
Sent: Thursday 29 November 2001 12:00 
To: beowulf at 
Subject:	Strange hardware (was Re: custom hardware (was: Xbox
On Thu, 29 Nov 2001, Daniel Pfenniger wrote: 
> I have seen similar strange behavior of some boxes in a set of 66's, 
> and the way  
to restart is also rather odd. 
We recently had strange problems with a Dell-Box which has been 
working without problems for  
several years in our small research 
cluster. It's a dual PII 400 MHz box, but suddenly the Linux kernel 
was unable to start the second  
CPU. It could see the second CPU, but 
when it tried to start it up during boot, it got a timeout and so 
continued with only one CPU. 
we though that one of the CPUs died and replaced both CPUs. Still 
the same problem. Next we replaced the motherboard (including the 
suply). Still the same problem. Maybe the disk corrupted the 
kernel, so we installed a fresh version of the same kernel onto the 
Still the same problem. Only after physically replacing the SCSI 
hard disk everything was working properly again. 
We are still wondering  
why a disk could cause a CPU to timeout during 
- Felix 
