[Beowulf] HPC in the cloud question
Gavin W. Burris
bug at wharton.upenn.edu
Fri May 8 08:08:21 PDT 2015
Hi, Mike.
We have been using StarCluster for some time, to deploy separate
clusters in the cloud, per user. We update a custom CentOS 7 AMI that
allows us to maintain binary compatibility with our Wharton HPCC system.
This solution can be staff time intensive and/or require user training
for deploying an application, launching a cluster and moving data
around.
We have also deployed Univa's cloud-bursting solution, UniCloud. I'm a
fan of this product and its approach. It wraps up Grid Engine with a
policy engine that will launch cloud nodes as needed, like jobs waiting
in a specific queue. This allows existing users to log into our regular
system and use the normal commands with a few extra aliases. The user
learning curve is much easier here, as staff do a one-time account and
billing setup, then users can qsub/qlogin jobs that use their own AWS
cloud queue. Jobs are submitted out of a user's cloud home directory
via NFS over VPC.
If you have questions, I'm happy to answer. We've figured out quite a
few of the usability issues with some friendly initial users.
Cheers.
On 10:28PM Thu 05/07/15 +0000, Hutcheson, Mike wrote:
> Hi. We are working on refreshing the centralized HPC cluster resources
> that our university researchers use. I have been asked by our
> administration to look into HPC in the cloud offerings as a possibility to
> purchasing or running a cluster on-site.
>
> We currently run a 173-node, CentOS-based cluster with ~120TB (soon to
> increase to 300+TB) in our datacenter. It¹s a standard cluster
> configuration: IB network, distributed file system (BeeGFS. I really
> like it), Torque/Maui batch. Our users run a varied workload, from
> fine-grained, MPI-based parallel aps scaling to 100s of cores to
> coarse-grained, high-throughput jobs (We¹re a CMS Tier-3 site) with high
> I/O requirements.
>
> Whatever we transition to, whether it be a new in-house cluster or
> something ³out there², I want to minimize the amount of change or learning
> curve our users would have to experience. They should be able to focus on
> their research and not have to spend a lot of their time learning a new
> system or trying to spin one up each time they have a job to run.
>
> If you have worked with HPC in the cloud, either as an admin and/or
> someone who has used cloud resources for research computing purposes, I
> would appreciate learning your experience.
>
> Even if you haven¹t used the cloud for HPC computing, please feel free to
> share your thoughts or concerns on the matter.
>
> Sort of along those same lines, what are your thoughts about leasing a
> cluster and running it on-site?
>
> Thanks for your time,
>
> Mike Hutcheson
> Assistant Director of Academic and Research Computing Services
> Baylor University
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
More information about the Beowulf
mailing list