[Beowulf] clustering using xen virtualized machines

Gavin Burris bug at sas.upenn.edu
Thu Jan 28 08:23:30 PST 2010

Two staff couldn't handle 2k users and 100 of departments, or that much
hardware.  Answering tickets or emails alone would be overwhelming.
Building/maintaining the VMs, or training/document/helping the
departments to build their own VMs is a monumental task in and of
itself.  A more realistic number is 1 FTE per 4 hpc-using departments.

I would wager that generalizing and not targeting any particular
performance aspect will only cause the departments to pool their own
money and build their own targeted resource, for less money, with a grad
student and an oreilly book.  I find that most users only have time for
their application workflow or domain-specific coding, not to be system
programmers making VMs.

Sorry, I'm not drinking the virtualization/cloud koolaid.  I'd love to
have everything abstracted and easy to manage, but I find standardizing
on an OS or two and keeping things as stock as possible is easier, and
cheaper to manage at this point.  In my situation, virtualization just
adds complexity and has a price/performance penalty.


On 01/28/2010 10:10 AM, Mark Hahn wrote:
>> I don't buy the argument that the winning case is packaging up a VM with
>> all your software.  If you really are unable to build the required
>> software stack for a given cluster and its OS, I think using something
> you're right, but only for narrow-function clusters.  suppose you have a
> cluster used by 2k users across a handful of different universities
> and 100 departments.  and have, let's say, 2 staff.  it's conceivable
> that using VMs would permit a higher level of service by putting more
> configuration flexibility into the hands of the users.  yes, most would
> use a standard image (which might be the bare-metal one, actually),
> but making it easier to accommodate variance is valuable.
> it even offers the ability to shift the model - instead of actually
> booting VMs on nodes for a job, how about just resurrecting a number
> of VM instances (freeze-dried in already-booted state)?  that makes the
> setup latency potentially much lower.  (pages from a VM image can
> be fetched lazily afaik, and presumably also COW.)
> for the few HPC-oriented performance studies of VMs I've seen,
> the only slowdowns were for OS activity (IO, page allocation, etc).
> an ideally-behaved HPC app minimizes those already, so...

More information about the Beowulf mailing list