[Beowulf] HPC in the cloud question
Hutcheson, Mike
Mike_Hutcheson at baylor.edu
Thu May 7 15:28:11 PDT 2015
Hi. We are working on refreshing the centralized HPC cluster resources
that our university researchers use. I have been asked by our
administration to look into HPC in the cloud offerings as a possibility to
purchasing or running a cluster on-site.
We currently run a 173-node, CentOS-based cluster with ~120TB (soon to
increase to 300+TB) in our datacenter. It¹s a standard cluster
configuration: IB network, distributed file system (BeeGFS. I really
like it), Torque/Maui batch. Our users run a varied workload, from
fine-grained, MPI-based parallel aps scaling to 100s of cores to
coarse-grained, high-throughput jobs (We¹re a CMS Tier-3 site) with high
I/O requirements.
Whatever we transition to, whether it be a new in-house cluster or
something ³out there², I want to minimize the amount of change or learning
curve our users would have to experience. They should be able to focus on
their research and not have to spend a lot of their time learning a new
system or trying to spin one up each time they have a job to run.
If you have worked with HPC in the cloud, either as an admin and/or
someone who has used cloud resources for research computing purposes, I
would appreciate learning your experience.
Even if you haven¹t used the cloud for HPC computing, please feel free to
share your thoughts or concerns on the matter.
Sort of along those same lines, what are your thoughts about leasing a
cluster and running it on-site?
Thanks for your time,
Mike Hutcheson
Assistant Director of Academic and Research Computing Services
Baylor University
More information about the Beowulf
mailing list