[Beowulf] HPC fault tolerance using virtualization

Mike Davis jmdavis1 at vcu.edu
Tue Jun 16 12:06:39 PDT 2009

John Hearns wrote:
> 2009/6/16 Egan Ford <egan at sense.net <mailto:egan at sense.net>>
>     I have no idea the state of VMs on IB.  That can be an issue with
>     MPI.  Believe it or not, but most HPC sites do not use MPI.  They
>     are all batch systems where storage I/O is the bottleneck. 
> Burn the Witch! Burn the Witch!
> Any HPC installation, if you want to show it off to alumni, august 
> committees from grant awarding bodies etc.  and not get sand kicked in 
> your face from the big boys in the Top 500 NEEDS an expensive 
> infrastructure of various MPI libraries. Big, big switches with lots 
> of flashing lights. Highly paid, pampered systems admins who must be 
> treated like expensive racehorses, and not exercised too much every 
> day. They need cool beers on tap and luxurious offices to relax in 
> while they prepare to do that vital half hours work per day which 
> keeps your Supercomputer flashing away and making noises.
I realize that this is humor, but one must remember just how sensitive 
System Admins can be before making such statements. I would like to 
refer you to the BOFH (Bastard Operator from Hell) or as I like to call 
it the SysAdmins guide to interpersonal relationships. Remember what 
these people do and more importantly what they can do.

On a serious note, who else get's out of bed at 3 am because an 
automated system indicates an issue with an HPC research cluster, or the 
Computing Center Calls because fresh water has been cut off and the 
building is warming, or you get the call that the water pumps (dual for 
redundancy but sharing one controller, now that's engineering) have 
failed, or that machine room power is dirty because 1/2 of the battery 
bank has shorted and the other half can't supply all of the needed clean 
power etc, etc.

In my experience, Sysadmins don't want beer or luxurious offices they 
want the tools that they need, proper managerial support, and respect.


Mike Davis			Technical Director
(804) 828-3885			Center for High Performance Computing
jmdavis1 at vcu.edu		Virginia Commonwealth University

"Never tell people how to do things. Tell them what to do and they will surprise you with their ingenuity."  George S. Patton

More information about the Beowulf mailing list