[Beowulf] Monitoring crashing machines

Carsten Aulbert carsten.aulbert at aei.mpg.de
Tue Sep 9 23:22:50 PDT 2008



Robert G. Brown wrote:
>
>  "putting a cheap monitor on a suspect or crashed node"
> 

One monitor to > 1300 1U server is not practical :)

> Or even after a crash.  If the primary graphics card is being used as a
> console, the frame buffer will probably retain the last kernel oops
> written to it (if any) even after it locks up the system proper.  Just
> plug a monitor into the framebuffer of a machine that has crashed and
> see if there is anything there.

Yes, that's already what we are doing (named "crash cart") we do see
some related messages but usually there is no scroll buffer available
anymore, thus mostly the important lines are lost.

> 
> One last method (from back in the dark ages):
> 
>  "putting a tty-output printer on as a console printer"
> 

Again, can you imagine
(1) getting 1300 of these
(2) and then employ enough students to refill the paper ;)

*chuckling*

Cheers

Carsten



More information about the Beowulf mailing list