[Beowulf] Redundant Array of Independent Memory - fork(Re: Checkpointing using flash)

Reuti reuti at staff.uni-marburg.de
Tue Sep 25 15:07:31 PDT 2012

Am 25.09.2012 um 12:19 schrieb Andrew Holway:

> <snip>
> Im pretty sure faulty hardware is the root cause of out fault
> tolerance problems :). In any case the main issue seems to be the loss
> of a chunk of your application memory when the node fail not so much
> the retransmission of messages. MPI has some kind of functionality
> inside to address fault tolerance anyway.

If you are interested: there was a lot of discussion about FT in MPI3. There is a mailing list:


-- Reuti

More information about the Beowulf mailing list