[Beowulf] cloudy HPC?

Thu Jan 30 13:15:22 PST 2014

Given prior art, I'm totally taking the fifth on this one ;-) hehehe!

http://marc.info/?l=beowulf&m=136219891020783&w=1

But all joking aside, I'm still super keen to understand all of this
at a much deeper level, certainly within the .edu space.

BTW: Chris Loken did an awesome job of explaining some of the .ca HPC
infrastructure and made mention of some really good strategic
investments that have been made in the past at our latest meeting down
at the NCSA.  The slides are posted:

http://www.ncsa.illinois.edu/Conferences/ARCC/agenda.html

Best,

j.

--
dr. james cuff, assistant dean for research computing, harvard
university | division of science | thirty eight oxford street,
cambridge. ma. 02138 | +1 617 384 7647 | http://rc.fas.harvard.edu

On Thu, Jan 30, 2014 at 3:57 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
>
> Hi all,
> I would be interested to hear any comments you have about delivering HPC services on a "cloudy" infrastructure.
>
> What I mean is: suppose there is a vast datacenter filled with beautiful new hosts, plonkabytes of storage and all sitting on the most wonderful interconnect.  One could run the canonical HPC stack on the bare metal (which is certainly
> what we do today), but would there be any major problems/overhead if it were only used to run VMs?
>
> by "HPC services", I mean a very heterogenous mixture of serial, bigdata, fatnode/threaded, tight-coupled-MPI, perhaps
> even GP-GPU stuff from hundreds of different groups, etc.
>
> For instance, I've heard some complaints about doing MPI on virtualized interconnect as being slow.  but VM infrastructure
> like KVM can give device ownership to the guest, so IB access *could* be bare-metal.  (if security is a concern, perhaps
> it could be implemented at the SM level.  OTOH, the usual sort
> of shared PaaS HPC doesn't care much about interconnect security...)
>
> I'm not really interested in particulars of, for instance, bursting workloads using the EC2 spot market.  I know the numbers:
> anyone with a clue can run academic/HPC-tuned facilities at a fraction of commercial prices.  I also know that clusters and datacenters are largely linear in cost once you get to a pretty modest size (say 20 racks).
>
> If you're interested in why I'm asking, it's because Canada is currently trying to figure out its path forward in "cyberinfrastructure".
> I won't describe the current sad state of Canadian HPC, except that it's hard to imagine *anything* that wouldn't be an improvement ;)
> It might be useful, politically, practically, optically, to split
> off hardware issues from the OS-up stack.  Doing this would at the very least make a perfectly clear delineation of costs, since the HW-host level has a capital cost, some space/power/cooling/service costs, no software costs, and almost no people costs.  the OS-up part
> is almost entirely people costs, since only a few kinds of research require commercial software.
>
> thanks, Mark Hahn.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>