[Beowulf] Checkpointing using flash

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Sat Sep 29 06:46:00 PDT 2012



On 9/29/12 2:29 AM, "Justin YUAN SHI" <shi at temple.edu> wrote:

>I missed this thread. Got busy with classes. Sorry.
>
>Going back to Jim's comments on Infiniband and OSI and MPI. I see the
>exacscale computing requires us to rethink MPI's insistence on sending
>message directly. Even with the group communicators, the
>implementation
>insists on the same.
>
>The problem with direct communication is that you leave the
>application without a recourse when the transmission fails. As we have
>discussed, any transient fault can cause that to happen. It is
>practically impossible to provide redundancy for every transmission
>unless we change our API design that eliminates the reliable
>communication assumption. The application-level re-transmission will
>allow the application to survive NOT only the communication failures
>but also node failures (when you loose a chunk of memory). But the MPI
>semantics does not allow this to happen, even if the implementation
>tries to re-transmit a failed message.


So what you're thinking is that the conceptual message passing be more
like UDP sockets?

That we explicitly accept that a "send" might not work, and in fact, may
"fail silently".


Yes.. That is a key aspect, and the higher level algorithm that uses it
needs to explicitly account for it: by multiple transmissions, multiple
paths, coding (in the ECC sense) or something else.




More information about the Beowulf mailing list