Strange hardware (was Re: custom hardware (was: Xbox clusters?))
Mark at MarkAndrewSmith.co.uk
Mark at MarkAndrewSmith.co.uk
Thu Nov 29 05:50:58 PST 2001
Yep, seen this problem many times in our computer hire range of
Windows2000Pro machines. The strange thing is that we only see this on Slot
1 Pentium II machines with various model motherboards. All our Pentium III
range are socket 370 and no problems. So we came to a feeling that the
problem was the way in which the Slot1 Pentium II sits on the motherboard.
After months of clients returning equipment to base under warranty, we
issued instruction on how to open the case and remove and re seat the
PentiumII Slot 1 processor package. The machines then boot every time after
switch on.
How many of you having this problem have it with the slot 1 Pentium II and
slot 2 Pentium III processors in your clusters? I bet none of you have it
with a socket 370 or other "flat" socket type of CPU package. We're
fortunate that our development cluster is based on Pentium 233MHz MMX "old"
ex-hire equipment so we don't have this problem on the cluster. Yet!
Regards,
Mark.
-----Original Message-----
From: Felix Rauch [SMTP:rauch at inf.ethz.ch]
Sent: Thursday 29 November 2001 12:00
To: beowulf at beowulf.org
Subject: Strange hardware (was Re: custom hardware (was: Xbox
clusters?))
On Thu, 29 Nov 2001, Daniel Pfenniger wrote:
> I have seen similar strange behavior of some boxes in a set of 66's,
> and the way
to restart is also rather odd.
[...]
We recently had strange problems with a Dell-Box which has been
working without problems for
several years in our small research
cluster. It's a dual PII 400 MHz box, but suddenly the Linux kernel
was unable to start the second
CPU. It could see the second CPU, but
when it tried to start it up during boot, it got a timeout and so
continued with only one CPU.
So
we though that one of the CPUs died and replaced both CPUs. Still
the same problem. Next we replaced the motherboard (including the
power
suply). Still the same problem. Maybe the disk corrupted the
kernel, so we installed a fresh version of the same kernel onto the
box.
Still the same problem. Only after physically replacing the SCSI
hard disk everything was working properly again.
We are still wondering
why a disk could cause a CPU to timeout during
boot...
- Felix
--
Felix Rauch | Email: rauch at inf.ethz.ch
Institute
for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489
CH
- 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To
change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20011129/db7b8362/attachment.html>
More information about the Beowulf
mailing list