[Beowulf] HPC fault tolerance using virtualization
hearnsj at googlemail.com
Mon Jun 15 10:59:37 PDT 2009
I was doing a search on ganglia + ipmi (I'm looking at doing such a
thing for temperature measurement) when I cam across this paper:
Proactive Fault Tolerance for HPC using Xen virtualization
Its something I've wanted to see working - doing a Xen live migration
of a 'dodgy' compute node, and the job just keeps on trucking.
Looks as if these guys have it working. Anyone else seen similar?
More information about the Beowulf