[Beowulf] experience with HPC running on OpenStack [EXT]

Jörg Saßmannshausen sassy-work at sassy.formativ.net
Wed Jul 8 02:19:06 PDT 2020

Hi Tim,

many thanks for sharing your experiences here and sorry for my slow reply, I 
am currently on annual leave and thus don't check emails on a daily base. 

The additional layer of complexity is probably the price you have to pay for 
the flexibility. What we are aiming for is having a more flexible system so we 
can move things around, similar to what you are doing. 

I might come back to this later if you don't mind.



Am Mittwoch, 1. Juli 2020, 12:13:05 BST schrieb Tim Cutts:
> Here, we deploy some clusters on OpenStack, and some traditionally as bare
> metal.   Our largest cluster is actually a mixture of both, so we can
> dynamically expand it from the OpenStack service when needed.
> Our aim eventually is to use OpenStack as a common deployment layer, even
> for the bare metal cluster nodes, but we’re not quite there yet.
> The main motivation for this was to have a common hardware and deployment
> platform, and have flexibility for VM and batch workloads.  We have needed
> to dynamically change workloads (for example in the current COVID-19
> crisis, our human sequencing has largely stopped and we’ve been
> predominantly COVID-19 sequencing, using an imported pipeline from the
> consortium we’re part of).  Using OpenStack we could get that new pipeline
> running in under a week, and later moved it from the research to the
> production environment, reallocating research resources back to their
> normal workload.
> There certainly are downsides; OpenStack is a considerable layer of
> complexity, and we have had occasional issues, although those rarely affect
> established running VMs (such as batch clusters).  Those occasional
> problems are usually in the services for dynamically creating and
> destroying resources, so they don’t have immediate impact on batch
> clusters.  Plus, we tend to use fairly static provider networks to connect
> the Lustre systems to virtual clusters, which removes another layer of
> OpenStack complexity.
> Generally speaking it’s working pretty well, and we have uptimes of in
> excess of 99.5%
> Tim
> On 1 Jul 2020, at 05:09, John Hearns
> <hearnsj at gmail.com<mailto:hearnsj at gmail.com>> wrote:
> Jorg, I would back up what Matt Wallis says. What benefits would Openstack
> bring you ?
 Do you need to set up a flexible infrastructure where clusters
> can be created on demand for specific projects? 
> Regarding Infiniband the concept is SR-IOV. This article is worth reading:
> https://docs.openstack.org/neutron/pike/admin/config-sriov.html
> [docs.openstack.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__d
> ocs.openstack.org_neutron_pike_admin_config-2Dsriov.html&d=DwMFaQ&c=D7ByGjS3
> 4AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwA
> w4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=VMHyCkd1eb1ztnzu4i617z
> rYxnddfDUUEkn1u45xQq0&e=>
> I would take a step back and look at your storage technology and which is
> the best one to be going forward with.
 Also look at the proceeding sof the
> last STFC Computing Insights where Martyn Guest presented  a lot of
> benchmarking results   on AMD Rome
> Page 103 onwards in this report
> http://purl.org/net/epubs/manifestation/46387165/DL-CONF-2020-001.pdf
> [purl.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__purl.org_net
> _epubs_manifestation_46387165_DL-2DCONF-2D2020-2D001.pdf&d=DwMFaQ&c=D7ByGjS3
> 4AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwA
> w4Q50&m=T0asmfOta_bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=GNtI2S6yacqAS4bpUYbfq4
> bDe8nv9gXksMXaqCqgbro&e=>
> On Tue, 30 Jun 2020 at 12:21, Jörg Saßmannshausen
> <sassy-work at sassy.formativ.net<mailto:sassy-work at sassy.formativ.net>>
> wrote:
 Dear all,
> we are currently planning a new cluster and this time around the idea was
> to
 use OpenStack for the HPC part of the cluster as well.
> I was wondering if somebody has some first hand experiences on the list
> here.
 One of the things we currently are not so sure about it is
> InfiniBand (or another low latency network connection but not ethernet):
> Can you run HPC jobs on OpenStack which require more than the number of
> cores within a box? I am thinking of programs like CP2K, GROMACS, NWChem
> (if that sounds familiar to you) which utilise these kind of networks very
> well.
> I cam across things like MagicCastle from Computing Canada but as far as I
> understand it, they are not using it for production (yet).
> Is anybody on here familiar with this?
> All the best from London
> Jörg
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org<mailto:Beowulf at beowulf.org>
> sponsored by Penguin Computing
 To change your subscription (digest mode or
> unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> [beowulf.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.
> org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7q
> lm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_
> bLT2cXWrpERYigde5lOqHx2vVIH2WSIOw&s=oVEwBKwlVDhzh5JPMjRBZxSAaRPRnCoMIkT-73oO
> NAo&e=> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org<mailto:Beowulf at beowulf.org>
> sponsored by Penguin Computing
 To change your subscription (digest mode or
> unsubscribe) visit
> https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_
> mailman_listinfo_beowulf&d=DwIGaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQn
> qBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=T0asmfOta_bLT2cXWrpERYig
> de5lOqHx2vVIH2WSIOw&s=oVEwBKwlVDhzh5JPMjRBZxSAaRPRnCoMIkT-73oONAo&e= 

More information about the Beowulf mailing list