[Beowulf] experience with HPC running on OpenStack
stig at stackhpc.com
Thu Jul 9 03:54:42 PDT 2020
I think this is all true, but...
There's a strong theme of "choose your own adventure" to designing performant OpenStack infrastructure. Consider it as a continuum of convenience vs performance: at one end we get all the flexibility of cloud and at the other we get all the performance of bare metal (and most of the flexibility of cloud). Systems like Lance's provide software-defined infrastructure, but with RDMA and GPU support - each OpenStack represents different trade-offs on this continuum.
I still think you're right that codes that scale poorly on bare metal will scale worse on VMs. If a hypercall boundary is on the critical path it's going to have an impact. The question is by how much (or little) and that's something we can't easily generalise, other than to remark that the overhead has reduced as technology evolves.
There's been some interesting discussion about the combination of bare metal and VMs in a single cloud infrastructure and that's an area where we see a lot of potential.
Perhaps in due course we'll view cloud in a similar context to other abstractions - virtual memory, logical volumes, filesystems, etc. - each with overheads that we are (usually) happy to trade.
I should disclose that I am of course biased in this matter.
> On 9 Jul 2020, at 08:42, <m.somers at chem.leidenuniv.nl> <m.somers at chem.leidenuniv.nl> wrote:
> i suggest you not only look at the flexibility / complexity regarding
> administrating a cluster with OpenStack [there are also many other tools
> for that] but also *actually benchmark* with a parallel (threaded) code
> you know well and check the strong scaling via Amdhal's law by making some
> speedup graphs on VMs and baremetal. You might actually be throwing away a
> lot of raw cpu power for *real* HPC codes by doing things on VMs. Serial
> on VMs != parallel on VMs: It all depends on your codes and the VM details
> P.S. in the paper above, the mentioned floating point micro benchmarks are
> 'trivially parallel': all cores work independently without any
> communication and do not share a single bigger workload. This is certainly
> not the case for actual HPC codes and these will not show such linearly
> perfect scaling neither on VMs, nor on bare metal.
> Mark Somers
> mark somers
> tel: +31715274437
> mail: m.somers at chem.leidenuniv.nl
> web: https://www.universiteitleiden.nl/en/staffmembers/mark-somers
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
More information about the Beowulf