[Beowulf] VMC - Virtual Machine Console

Wed Jan 16 07:31:11 PST 2008

Douglas Eadline wrote:
> I get the desire for fault tolerance  etc. and I like the idea
> of migration. It is just that many HPC people have spent
> careers getting applications/middleware as close to the bare
> metal as possible. The whole VM concept seems orthogonal to
> this goal. I'm curious how people are approaching this
> problem.
>   
Like many things, the devil is in the details. While I don't want to be as
prodigious as rgb, I want to mention a few things and ask some questions:

- With multi-core processors, to get the best performance you want to
   assign a process to a core. But this can cause problems when moving
   a process or creating a checkpoint. For example VMware explicitly
   tells you not to do this. While I can't state their position, in 
general the
   idea is that restarting a check-pointed VM may have problems when
   a process is pinned to a core (even more so if the CPU is different).
   Also, moving a pinned process to another node may cause problems
   if the nodes is different in pretty much any way (it may also be affected
   by what's on the new node).

- As Ashley pointed out, the network aspect is still very problematic.
   Getting good performance out of a NIC in a VM is not easy and from
   what I understand difficult or impossible to do with multi-core nodes
   (I would love to hear if someone has gotten very good performance out
   of a NIC in a VM when other VM's are also using the same NIC. Please
   give as many details as possible)

- As Meng mentioned, IO is still problematic (I think for the same reasons
   that interconnects are).

- I haven't seen any benchmarks run in VM's using several nodes with
   an interconnect. Does anyone know of any?

- Has anyone tried moving processes around to different nodes for an
   MPI job? I'm curious what they found.

I would like to see virtualization take off in HPC, but I have to see a few
demos of things working and I need to see reasons why I should adopt
it. Right not I don't relish taking my "High" Performance Computing
system and turning it into "Kind-of-High" Performance Computing because
it would allow non-code specific checkpointing or movement of processes.
Losing 10% in performance, for example, in HPC is a big deal, and I haven't
yet seen the benefits of virtirualization for giving up the 10% (I'm 
dying to
be shown to be wrong though).

The only aspect of virtualization that could make some sense in HPC is
what rgb mentioned - allowing the user to select and OS as part of their
job and installing or tearing down the OS as part of the job. I can see this
being very useful if the details could be worked out (I know there are 
people
working on it but I haven't seen any large demonstrations of it yet and I
would really like to see such a beastie).

Anyway, my 2 cents (and probably my last since this topic falls under
Landman's Rule: of flammability).

Jeff