[Beowulf] clustering using xen virtualized machines

Ashley Pittman ashley at pittman.co.uk
Fri Jan 29 13:55:04 PST 2010

On 26 Jan 2010, at 19:37, Paul Van Allsburg wrote:
> Ashley Pittman wrote:
>> On 25 Jan 2010, at 15:28, Jonathan Aquilina wrote:
>>> has anyone tried clustering using xen based vm's. what is everyones take on that? its something that popped into my head while in my lectures today.
>> I've been using Amazon ec2 for clustering for months now, from a software perspective it's very similar to running real hardware.  For my needs (development) it's perfectly adequate, I've not benchmarked it against running the same code on the raw hardware though.
> I'd love to try clustering on Amazon.

It's really easy.

> Is there a good writeup somewhere on how to configure & use mpi in the cloud?

I'm not sure one is needed.  As a bit of background I develop and support an open source debugging tool for parallel applications (see my sig for details), as such I run a lot of parallel apps but I run them purely to have something to test padb against hence I'm not bothered about performance, I just need a running job to interrogate.  What is important for me (or rather my tool) is that it works in different environments so I run with a variety of clustering software.

With Amazon I can boot any numbers of machine "instances" and pay $0.85c/h for each one, typically I run four at a time but I've run with up to twenty.  Once the instances are booted there is no difference between using them and using real machines.  I regularly use Slurm, OpenMPI (ORTE and under Slurm), MPICH2 (mpd, hydra and under slurm) and I've yet to find any way in which the setup differs from running on real metal.  For persistent storage I pay for a 'EBS' volume which I attach to one vm and nfs export to the others which use as a shared /home, each instance also comes with a large scratch partition but I typically don't use this at all.  I have a bunch of scripts for populating the hosts files and adding user accounts and that's all there is to it.  For the EBS volume you simply pick the size you need, create the volume, attach it to a vm and them mkfs.ext3 as normal, this volume is persistent and is charged for by Gb by calendar month rather than instance hour.

I can also choose what distro and indeed OS to run, the default is FC8 but it's easy enough to pick something else, I tend to flip between FC8, debian and Solaris every few weeks, this is mostly to ensure my code is well tested in different machines - it does mean re-compiling everything each time I switch which can take a while.

I also noticed that over-committing virtual machines doesn't have the same negative impact as over-commiting the CPU's on virtual machines, sure the application performance plummets in either case but the virtual machine is still usable where as a real machine can stop responding almost completely.  This means I can over-commit my vm's by running 32 procs per node and run 512 process jobs at a cost of only $1.36 an hour.  Cheap enough to be able to try something, see if it works and not have to worry about the cost.

In short, Amazon makes a really good development or test system for small scale clusters, it's good for testing code correctness and experimenting with different distos.  I'm not convinced about the performance and I'm not convinced about the cost effectiveness or larger or longer running applications but as a place to start it's ideal.



Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing

More information about the Beowulf mailing list