[Beowulf] reboot without passing through BIOS?

David Mathog mathog at caltech.edu
Fri Aug 1 09:11:25 PDT 2008


Kilian CAVALOTTI <kilian at stanford.edu> wrote:
> I may be totally missing the point, but doesn't the memory need to be 
> physically (as in electrically) reset in order to clean out those bad 
> bits? And doesn't this require a hard reboot, for the machine to be 
> power cycled, so that memory cells are reinitialized? 

The type of errors I am talking about are random bit flips, for
instance, from ambient radiation.  When the OS reboots it will overwrite
memory and so remove those errors.  The affected cells were not damaged,
just in the wrong state.  This should work so long as none of the
damaged bits prevent kexec from doing its job.  Presumably the OS will
also reinitialize all memory structures stored elsewhere in hardware (as
in storage controllers and NICs) since it should not trust the BIOS to
have done this.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list