[Beowulf] Re: Cluster newbie, power recommendations

Donald Becker becker at scyld.com
Tue Mar 21 15:06:51 PST 2006

On Tue, 21 Mar 2006, John Hearns wrote:

> On Tue, 2006-03-21 at 13:06 -0500, Joe Landman wrote:
> > Not sure of the performance impact of this, but you could look at OpenVZ 
> > or Xen as well (when it is ready).  
> Xen has very little impact on performance. I saw some very good figures
> at a recent presentation at FOSDEM.

Have you tried Xen yourself?  With real-life applications?  I think 
that you'll find earlier published numbers selected operations that 
benchmarked favorably.

> I guess the biggest drawback for MPI type work would be the emulated NIC
> is pretty outdated. I think I remember Ian Pratt saying this will be
> changed.

Para-virtualization can be pretty efficient for computational work, but 
not communication or I/O.  If it's emulating the NIC registers, sending a 
bunch of small, latency-sensitive packets can be pretty painful. What you 
need is a more efficient communication to the underlying OS or hardware.  
There are several approach, but all have obvious drawbacks.

  Enable direct access to the physical NIC hardware.  This can be done 
  with little overhead, but now you can only have one virtual machine on 
  the physical machine, and cannot migrate the VM.  (This same limitation
  applies to local disks as well.)

  Emulate a real-life NIC, much like VMWare emulates a AMD LANCE.  This 
  involves CPU overhead to mimic the hardware registers and bus 
  transactions, as well as the quirks of the actual device.

  Emulate an ideal virtual NIC, instead of a real-life one.  You have to 
  write a device driver for each OS, but you can make the Host OS 
  emulation simpler.  (VMWare emulates an old LANCE design to minimize 
  complexity, but adds back some modern features in an easier-to-emulate

A better approach is recognizing that para-virtualization involves hacking 
the OS anyway.  You can create a new NIC interface model that 
allows e.g. page flipping with the host OS to enable lower overhead 
communication.  But now you are touching more than the device driver, you 
are reaching into the buffer and memory management of the OS.

The storage interface has some of the same issues.  It does has the 
advantages of dealing with whole blocks, not being as latency 
sensitive, and allowing buffering/read-ahead/write-behind.  But the guest 
OS inconveniently expects that it has exclusive access to blocks that will 
still be there later ;->.

Many of these same issues remain even when we have VT or Pacifica.  Unless 
the underlying devices are designed with virtualization in mind, and both
the Host OS and Guest OS know how to handle that specific hardware, there 
will be run-time overhead for virtual machines.

Hmmm, I almost drifted into the topic of "perhaps there is a better layer 
to virtualize at".  Some of the list readers know where that one ends up 
at.  Instead I'll keep the bottom line on-subject: Virtualizing at the 
machine level inherently has overhead, and it's still pretty noticeable 
with the current implementations.

Donald Becker				becker at scyld.com
Scyld Software	 			Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220		www.scyld.com
Annapolis MD 21403			410-990-9993

More information about the Beowulf mailing list