[Beowulf] HPC fault tolerance using virtualization

John Hearns hearnsj at googlemail.com
Mon Jun 15 10:59:37 PDT 2009

I was doing a search on ganglia + ipmi (I'm looking at doing such a
thing for temperature measurement) when I cam across this paper:


Proactive Fault Tolerance for HPC using Xen virtualization

Its something I've wanted to see working - doing a Xen live migration
of a 'dodgy' compute node, and the job just keeps on trucking.
Looks as if these guys have it working. Anyone else seen similar?

John Hearns

More information about the Beowulf mailing list