[Beowulf] Re: Cluster newbie, power recommendations
Donald Becker
becker at scyld.com
Tue Mar 21 15:06:51 PST 2006
On Tue, 21 Mar 2006, John Hearns wrote:
> On Tue, 2006-03-21 at 13:06 -0500, Joe Landman wrote:
> > Not sure of the performance impact of this, but you could look at OpenVZ
> > or Xen as well (when it is ready).
>
> Xen has very little impact on performance. I saw some very good figures
> at a recent presentation at FOSDEM.
Have you tried Xen yourself? With real-life applications? I think
that you'll find earlier published numbers selected operations that
benchmarked favorably.
> I guess the biggest drawback for MPI type work would be the emulated NIC
> is pretty outdated. I think I remember Ian Pratt saying this will be
> changed.
Para-virtualization can be pretty efficient for computational work, but
not communication or I/O. If it's emulating the NIC registers, sending a
bunch of small, latency-sensitive packets can be pretty painful. What you
need is a more efficient communication to the underlying OS or hardware.
There are several approach, but all have obvious drawbacks.
Enable direct access to the physical NIC hardware. This can be done
with little overhead, but now you can only have one virtual machine on
the physical machine, and cannot migrate the VM. (This same limitation
applies to local disks as well.)
Emulate a real-life NIC, much like VMWare emulates a AMD LANCE. This
involves CPU overhead to mimic the hardware registers and bus
transactions, as well as the quirks of the actual device.
Emulate an ideal virtual NIC, instead of a real-life one. You have to
write a device driver for each OS, but you can make the Host OS
emulation simpler. (VMWare emulates an old LANCE design to minimize
complexity, but adds back some modern features in an easier-to-emulate
way.)
A better approach is recognizing that para-virtualization involves hacking
the OS anyway. You can create a new NIC interface model that
allows e.g. page flipping with the host OS to enable lower overhead
communication. But now you are touching more than the device driver, you
are reaching into the buffer and memory management of the OS.
The storage interface has some of the same issues. It does has the
advantages of dealing with whole blocks, not being as latency
sensitive, and allowing buffering/read-ahead/write-behind. But the guest
OS inconveniently expects that it has exclusive access to blocks that will
still be there later ;->.
Many of these same issues remain even when we have VT or Pacifica. Unless
the underlying devices are designed with virtualization in mind, and both
the Host OS and Guest OS know how to handle that specific hardware, there
will be run-time overhead for virtual machines.
Hmmm, I almost drifted into the topic of "perhaps there is a better layer
to virtualize at". Some of the list readers know where that one ends up
at. Instead I'll keep the bottom line on-subject: Virtualizing at the
machine level inherently has overhead, and it's still pretty noticeable
with the current implementations.
--
Donald Becker becker at scyld.com
Scyld Software Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220 www.scyld.com
Annapolis MD 21403 410-990-9993
More information about the Beowulf
mailing list