[Beowulf] hang-up of HPC Challenge

Chris Samuel csamuel at vpac.org
Sun Sep 7 01:58:32 PDT 2008


----- "Mikhail Kuzminsky" <kus at free.net> wrote:

Hi Mikhail,

Sorry for the delay in getting back to you, work has
been keeping me very occupied!

> In message from Chris Samuel <csamuel at vpac.org> (Wed, 20 Aug 2008 
> 11:12:52 +1000 (EST)):
>
> >Does the code crash, does it just stop & idle, does it
> >busy loop, does the node oops, does it lockup, etc ?
> 
> I beleive that program crash is not hangup. When I wrote
> about Linux hangup, I means that Linux don't response to
> any interrupts - from keyboard, from ssh client requests etc. 

That really sounds like either your hitting a kernel or
hardware issues - might be worth trying out the BreakIn
tool that Jason posted about elsewhere on the list:

http://www.advancedclustering.com/software/breakin.html

> I use 2.6.22.5-31 kernel from SuSE 10.3 distribution.

That's pretty old now, I'd strongly suggest trying out
the current mainline kernel on there, this works pretty
well on our SuperMicro based Barcelona cluster.

cheers!
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency



More information about the Beowulf mailing list