[scyld-users] Re: Scyld system mysteriously locks up
Kristen J. McFadden
kristen at cgcmail.cpmc.columbia.edu
Mon Mar 22 14:38:54 PST 2004
We experienced the same sort of thing until we added in sleep 5 or sleep
10's automatically in between the dispatching of jobs... Perhaps you
could try that?
From: scyld-users-admin at scyld.com [mailto:scyld-users-admin at scyld.com]
On Behalf Of Tim Whitcomb
Sent: Thursday, March 18, 2004 12:36 PM
To: scyld-users at scyld.com
Subject: [scyld-users] Re: Scyld system mysteriously locks up
> I purchased a 4 node, 8 processor Scyld (version 28) cluster >
approximately 6 months ago. About 5 days ago, it started mysteriously
> locking up on me. Once it is locked up, I can't do anything except >
physically reboot the machine.
> Unfortunately, I am rather new to Linux clusters and, since it worked
> "right out of the box", I have had no experience in troubleshooting.
> Can someone give me an idea of where I should start?
> I have the BIOS on all machines set to do a full memory check on
startup > and the /var/log/message file shows nothing.
This sounds suspiciously like a problem we've been fighting for the past
year at least. Are the machines actively running a job when they lock
up or are they sitting idle? I've done some tests that seem to suggest
that our system does not like the same job being run on both processors
of the same machine. Where did you purchase your equipment from, what
kind of processors are in it, what kind of interconnect are you using,
and what is the motherboard in the machines?
Timothy R. Whitcomb
Applied Physics Lab
University of Washington
mail: twhitcomb at apl dot washington dot edu
voice: (206) 543-2663
Scyld-users mailing list, Scyld-users at beowulf.org To change your
subscription (digest mode or unsubscribe) visit
More information about the Scyld-users