[Beowulf] clustering using xen virtualized machines

Peter Clapham pc7 at sanger.ac.uk
Tue Jan 26 07:24:25 PST 2010

On the AWS ec2 side, we've been performing a range of tests including 
full genome sequencing pipelines across varying numbers of nodes and 
storage. The biggest challenge to date has been IO, particularly if the 
smaller image systems are used. Where jobs are highly cpu bound, little 
network (or heaven forbid disk) bound things go reasonably well and have 
the potential to scale. Once IO becomes a factor the scaling decreases 

We've also had a run around with Xen and it requires more network 
tiffling to automate role outs (at least in our environment) but it 
works ok, especially when paired with something like openQRM.  It's a 
ways off being as polished as VMware and some of the interesting memory 
handling doesn't appear to be all there. As a result performance 
degrades rapidly as the number of hosts and IO hungry app load increases 
fairly severely. Regrettably I don't have enough useful data to present 
this at present and as always YMMV.

> I've been using Amazon ec2 for clustering for months now, from a software perspective it's very similar to running real hardware.  For my needs (development) it's perfectly adequate, I've not benchmarked it against running the same code on the raw hardware though.
