[Beowulf] Checkpointing MPI applications
    Scott Atchley 
    e.scott.atchley at gmail.com
       
    Mon Mar 27 16:27:59 UTC 2023
    
    
  
On Thu, Mar 23, 2023 at 3:46 PM Christopher Samuel <chris at csamuel.org>
wrote:
> On 2/19/23 10:26 am, Scott Atchley wrote:
>
> > We are looking at SCR for Frontier with the idea that users can store
> > checkpoints on the node-local drives with replication to a buddy node.
> > SCR will manage migrating non-defensive checkpoints to Lustre.
>
> Interesting, does it really need local storage or can it be used with
> diskless systems via tricks with loopback filesystems, etc?
Yes, it only needs a mount path. It can be ramfs/tmpfs, xfs (or other local
file system), etc.
Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20230327/f4cd8dac/attachment.htm>
    
    
More information about the Beowulf
mailing list