[Beowulf] cloudy HPC?

Mark Hahn hahn at mcmaster.ca
Thu Jan 30 12:57:14 PST 2014


Hi all,
I would be interested to hear any comments you have about 
delivering HPC services on a "cloudy" infrastructure.

What I mean is: suppose there is a vast datacenter filled 
with beautiful new hosts, plonkabytes of storage and all 
sitting on the most wonderful interconnect.  One could run 
the canonical HPC stack on the bare metal (which is certainly
what we do today), but would there be any major problems/overhead 
if it were only used to run VMs?

by "HPC services", I mean a very heterogenous mixture of 
serial, bigdata, fatnode/threaded, tight-coupled-MPI, perhaps
even GP-GPU stuff from hundreds of different groups, etc.

For instance, I've heard some complaints about doing MPI on 
virtualized interconnect as being slow.  but VM infrastructure
like KVM can give device ownership to the guest, so IB access 
*could* be bare-metal.  (if security is a concern, perhaps
it could be implemented at the SM level.  OTOH, the usual sort
of shared PaaS HPC doesn't care much about interconnect security...)

I'm not really interested in particulars of, for instance, 
bursting workloads using the EC2 spot market.  I know the numbers:
anyone with a clue can run academic/HPC-tuned facilities at a 
fraction of commercial prices.  I also know that clusters and 
datacenters are largely linear in cost once you get to a pretty 
modest size (say 20 racks).

If you're interested in why I'm asking, it's because Canada is 
currently trying to figure out its path forward in "cyberinfrastructure".
I won't describe the current sad state of Canadian HPC, except that 
it's hard to imagine *anything* that wouldn't be an improvement ;)
It might be useful, politically, practically, optically, to split
off hardware issues from the OS-up stack.  Doing this would at the 
very least make a perfectly clear delineation of costs, since the 
HW-host level has a capital cost, some space/power/cooling/service 
costs, no software costs, and almost no people costs.  the OS-up part
is almost entirely people costs, since only a few kinds of research 
require commercial software.

thanks, Mark Hahn.



More information about the Beowulf mailing list