[Beowulf] HPC in the cloud question

Gavin W. Burris bug at wharton.upenn.edu
Fri May 8 08:08:21 PDT 2015


Hi, Mike.

We have been using StarCluster for some time, to deploy separate
clusters in the cloud, per user.  We update a custom CentOS 7 AMI that
allows us to maintain binary compatibility with our Wharton HPCC system.
This solution can be staff time intensive and/or require user training
for deploying an application, launching a cluster and moving data
around.

We have also deployed Univa's cloud-bursting solution, UniCloud.  I'm a
fan of this product and its approach.  It wraps up Grid Engine with a
policy engine that will launch cloud nodes as needed, like jobs waiting
in a specific queue.  This allows existing users to log into our regular
system and use the normal commands with a few extra aliases.  The user
learning curve is much easier here, as staff do a one-time account and
billing setup, then users can qsub/qlogin jobs that use their own AWS
cloud queue.  Jobs are submitted out of a user's cloud home directory
via NFS over VPC.

If you have questions, I'm happy to answer.  We've figured out quite a
few of the usability issues with some friendly initial users.

Cheers.

On 10:28PM Thu 05/07/15 +0000, Hutcheson, Mike wrote:
> Hi.  We are working on refreshing the centralized HPC cluster resources
> that our university researchers use.  I have been asked by our
> administration to look into HPC in the cloud offerings as a possibility to
> purchasing or running a cluster on-site.
> 
> We currently run a 173-node, CentOS-based cluster with ~120TB (soon to
> increase to 300+TB) in our datacenter.  It¹s a standard cluster
> configuration:  IB network, distributed file system (BeeGFS.  I really
> like it), Torque/Maui batch.  Our users run a varied workload, from
> fine-grained, MPI-based parallel aps scaling to 100s of cores to
> coarse-grained, high-throughput jobs (We¹re a CMS Tier-3 site) with high
> I/O requirements.
> 
> Whatever we transition to, whether it be a new in-house cluster or
> something ³out there², I want to minimize the amount of change or learning
> curve our users would have to experience.  They should be able to focus on
> their research and not have to spend a lot of their time learning a new
> system or trying to spin one up each time they have a job to run.
> 
> If you have worked with HPC in the cloud, either as an admin and/or
> someone who has used cloud resources for research computing purposes, I
> would appreciate learning your experience.
> 
> Even if you haven¹t used the cloud for HPC computing, please feel free to
> share your thoughts or concerns on the matter.
> 
> Sort of along those same lines, what are your thoughts about leasing a
> cluster and running it on-site?
> 
> Thanks for your time,
> 
> Mike Hutcheson
> Assistant Director of Academic and Research Computing Services
> Baylor University
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania


More information about the Beowulf mailing list