[Beowulf] non-stop computing
guy.coates at gmail.com
Thu Oct 27 08:38:15 PDT 2016
BLCR or DMTCP should both be able to checkpoint a single node job (single
or multi threaded) straight out of the box; you won't need to recompile any
of your binaries.
DMTCP does not require any kernel modules, and so you might find that
easier going if you are on a more recent kernel than BLCR supports. (DMTCP
also seems to do a better job handling MPI jobs than BLCR does, if you care
Dr. Guy Coates
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf