[Beowulf] Checkpointing using flash

Hearns, John john.hearns at mclaren.com
Fri Sep 21 07:49:32 PDT 2012


Frequent checkpointing will of course be vital for exascale, given the MTBF of individual nodes.

However how accurate is this statement:

HPC jobs involving half a million compute cores ... have a series of checkpoints set up in their code with the entire memory state stored at each checkpoint in a storage node.

John Hearns | CFD Hardware Specialist | McLaren Racing Limited
McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK

T:  +44 (0) 1483 262000
D:  +44 (0) 1483 262352
F:  +44 (0) 1483 261928
E:  john.hearns at mclaren.com
W: www.mclaren.com<http://www.mclaren.com/>

The contents of this email are confidential and for the exclusive use of the intended recipient.  If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20120921/68138874/attachment.html>

More information about the Beowulf mailing list