Processes get SIGKILL after about 15 secs (Scyld)
Thomas Clausen
tclausen at wesleyan.edu
Wed Mar 26 14:23:42 PST 2003
Hi,
I have a problem I can't solve:
We are running Scyld with kernel
Linux version 2.4.17-0.18.18_Scyldsmp (support at builder.scyld.com) (gcc
version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Thu Jul 11
18:54:56 EDT 2002
on 70 nodes. 20 of them are newly acquired dual Athlons on Tyan 2466 boards.
When I start a process on any of these nodes (ex: bpsh 64 sleep 500) they
get a SIGKILL after about 15 secs:
[pid 5685] nanosleep({500, 0}, 0) = -1 EINTR (Interrupted system call)
[pid 5684] <... select resumed> ) = 3 (in [4 5 6], left {286, 370000})
[pid 5685] +++ killed by SIGKILL +++
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
close(4) = 0
read(5, "", 4096) = 0
close(5) = 0
read(6, "", 4096) = 0
close(6) = 0
wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], 0, NULL) = 5685
write(2, "bpsh: Child process exited abnor"..., 39bpsh: Child process exited
abnormally.
) = 39
wait4(-1, 0xbffff548, 0, NULL) = -1 ECHILD (No child processes)
_exit(255) = ?
I have tried to find out where the signal comes from but without success.
I can run memtest86 (booting from floppy) on the machines and the hardware
seems to be running fine. I'm at a loss...
Thomas
--
.^. Thomas Clausen, graduate student
/V\ Physics Department, Wesleyan University, CT
// \\ Tel 860-685-2018, fax 860-685-2031
/( )\
^^-^^ Use Linux
More information about the Beowulf
mailing list