[Beowulf] HPC fault tolerance using virtualization

John Hearns hearnsj at googlemail.com
Tue Jun 16 02:05:07 PDT 2009


2009/6/16 Kilian CAVALOTTI <kilian.cavalotti.work at gmail.com>

> My take on this is that it's probably more efficient to develop
> checkpointing
> features and recovery in software (like MPI) rather than adding a
> virtualization layer, which is likely to decrease performance.
>
The performance hits measured by Panda et. al. on Infiniband connected
hardware are of the order of 5 percent (I may be wrong here). I believe that
if we can get features like live migration of failing machines, plus
specialized stripped-down virtual machines specific to job types then we
will see virtualization becoming mainstream in HPC clustering.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20090616/06026c8b/attachment.html>


More information about the Beowulf mailing list