[Beowulf] A 10,000-node Grid Engine Cluster in Amazon EC2

Mon Dec 17 10:13:23 PST 2012

On Thu, Dec 6, 2012 at 12:53 PM, Chi Chan <chichan2008 at gmail.com> wrote:
> On Wed, Nov 28, 2012 at 12:10 PM, Rayson Ho <raysonlogin at gmail.com> wrote:
>> 1) We ran a 10,000-node cluster on Amazon EC2 for Grid Engine
>> scalability testing a few weeks ago:
>>
>> http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html
>
> I saw some other large clusters, and most used larger number of cores
> and few nodes.

The main goal of the 10,000 node cluster was to stress the networking
layer of Grid Engine. In Grid Engine, commlib is the low level library
that handles communication between any Grid Engine nodes. In 2005,
when Hess Corp added more nodes to their Grid Engine cluster, they
found that the cluster stopped working at around 1000 nodes. (That
commlib bug was worked around by Sun, and finally fixed by Ron & I and
now used in any fork of Grid Engine.)

There are some performance issues that we would like to fix before we
run something even larger (like 20,000 nodes and beyond :-D ), and I
think we are hitting the "C10K problem" that was encountered by web
servers a few years ago!

>
> For example, Numerate’s Drug Design Platform Scales to 10,000+ Cores

Running 10,000 cores in EC2 is way easier than booting up 10,000
nodes.  If all you need is just 10,000 cores, with cc2.8xlarge
(Cluster Compute Eight Extra Large Instance) that has 16 Intel Xeon
E5-2670 cores per VM, you only need around 600 instances (instance =
VM). With MIT StarCluster, a 100-instance cluster could be provisioned
with NFS, Open Grid Scheduler/Grid Engine, user accounts, MPI
libraries, etc in less than 10 minutes.

Since the overhead to provision an instance is the same whether it be
a small instance with 1 core, or a large instance with 16 cores, we
can use a larger instance type and get 160,000 cores in EC2 to form
the 10,000-node cluster in the same amount of time. On the other hand,
the scheduler logic will now see 16 times more job slots than before,
so it will then stress the scheduler even more.

> Using Spot Instances:
>
> http://numerate.com/blog/?p=155

We also used spot instances to lower the cost too. The spot price for
cc2.8xlarge is $0.27/hr, so one can get the same cluster at $2,700/hr
if the jobs can be restarted with issues. The standard price is
$2.4/hr for the 16-core cc2.8xlarge, and that's 24,000/hr for the
10,000-node, 160,000-core cluster, that can be very expensive or very
cheap depending on how you are using the cluster: if all you need is
to run something large and want to get the results quickly, and then
let the cluster sit idle for a few months, then IMO Cloud HPC is the
best choice!

(Note: As a hack, if the cluster is too large, then one could break it
up into smaller clusters, but then it is not a *real* large cluster,
as the number of job slots seen by the scheduler logic is way less!)

> And I think they are also using bare-metal instead of VM, so it can
> benefit a few types of parallel apps that require low latency.

Yes, that's one of the advantages of Gompute - they offer bare-metal
machines (instead of VMs in EC2), so the machines are not shared. In
EC2, if you run in a VPC (Virtual Private Cloud), then you can get
request for "Dedicated Instances", but the cost is quite high if all
you need is just 1 or 2 instances. However, if you have thousands of
nodes, then $10/hr is nothing compare to the total cost.

http://aws.typepad.com/aws/2011/03/amazon-ec2-dedicated-instances.html

Rayson

P.S. I will follow up with you offline if you have further questions...

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

>
> --Chi
>
>
>>
>>
>> 3) Lastly, we should also mention StarCluster from MIT. While it is
>> not backed by any single vendor, StarCluster is used by lots of
>> companies - for example the BioTeam recommends it, and we also use it
>> for some of our Grid Engine testing as well! If one just needs a small
>> to medium cluster, then StarCluster can provision it for you in EC2
>> very quickly; for example, a 100-node cluster could be installed in
>> around 10 minutes - and that was using spot instances, which have a
>> slightly higher start time due to the bidding process.
>>
>> Rayson
>>
>> ==================================================
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>>
>>
>> On Fri, Oct 26, 2012 at 5:26 PM, Douglas Eadline <deadline at eadline.org> wrote:
>>> Since the North East Coast (yea, the capitals are there for you Lux)
>>> will be under some clouds this weekend I thought my
>>> recent survey of "HPC Cloud" offerings may be of
>>> interest. (notice the quotes)
>>>
>>>    Moving HPC to the Cloud
>>>      http://hpc.admin-magazine.com/Articles/Moving-HPC-to-the-Cloud
>>>
>>> Some cold water to dump on your head maybe found here:
>>>
>>>    Will HPC Work In The Cloud?
>>>      http://clustermonkey.net/Grid/will-hpc-work-in-the-cloud.html
>>>
>>>
>>> --
>>> Doug
>>>
>>> --
>>> Mailscanner: Clean
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf