[Beowulf] $1, 279-per-hour, 30, 000-core cluster built on Amazon EC2 cloud

Chris Dagdigian dag at sonsorol.org
Tue Oct 4 12:29:28 PDT 2011


I'm largely with RGB on this one with the minor caveat that I think he 
might be undervaluing the insane economies of scale that IaaS providers 
like Amazon & Google can provide.

At the scale that Amazon operates at, they can obtain and run 
infrastructure far, far more efficiently than most (if not all) of us 
can ourselves. These folks have exabytes of spinning disk, redundant 
data-centers (with insane PUE values) all over the world and they know 
how to manage hundreds of thousands of servers with high efficiency in a 
very hostile networking environment. Not only can they run bigger and 
more efficient than we can, they can charge a price that makes them a 
profit while still being (in many cases) far cheaper than my own costs 
should I be truly honest about the fully-loaded costs of maintaining HPC 
or IT services.

AWS has a history of lowering prices as their own costs go down. You can 
see this via the EC2 pricing history as well as the now-down-to-zero 
cost of inbound data transit.

AWS Spot market makes this even more interesting. I can currently run an 
m1.4xlarge 64bit server instance with 15GB RAM for about $.24 per hour - 
close to 50% cheaper than the published hourly price and that spot price 
can hold steady for weeks at a time in many cases.

The biggest hangup is the economics. Even harder in an academic 
environment where researchers are used to seeing their funds vanish to 
"overhead" on their grant or they just assume that datacenters, 
bandwidth, power and hosting are all "free" to use.

It's hard to do true cost comparisons but time and time again I've seen 
IaaS come out ahead when the fully-loaded costs are actually put down on 
paper.

Here is a cliche example: Amazon S3

Before the S3 object storage service will even *acknowledge* a 
successful PUT request, your file is already at rest in at least three 
amazon facilities.

So to "really" compare S3 against what you can do locally you at least 
have to factor in the cost of your organization being able to provide 3x 
multi-facility replication for whatever object store you choose to deploy...

I don't want to be seen as a shill so I'll stop with that example. The 
results really are surprising once you start down the "true cost of IT 
services..." road.


As for industry trends with HPC and IaaS ...

I can assure you that in the super practical & cynical world of biotech 
and pharma there is already an HPC migration to IaaS platforms that is 
years old already. It's a lot easier to see where and how your money is 
being spent inside a biotech startup or pharma and that is (and has) 
shunted a decent amount of spending towards cloud platforms.

The easy stuff is moving to IaaS platforms. The hard stuff, the custom 
stuff, the tightly bound stuff and the data/IO-bound stuff is staying 
local of course - but that still means lots of stuff is moving externally.

The article that prompted this thread is a great example of this. The 
client company had a boatload of one-off molecular dynamics simulations 
to run. So much, in fact, that the problem was computationally 
infeasable to even consider doing inhouse.

So they did it on AWS.

30,000 CPU cores. For ~$9,000 dollars.

Amazing.

It's a fun time to be in HPC actually. And getting my head around "IaaS" 
platforms turned me onto things (like opscode chef) that we are now 
bringing inhouse and integrating into our legacy clusters and grids.


Sorry for rambling but I think there are 2 main drivers behind what I 
see moving HPC users and applications into IaaS cloud platforms ...


(1) The economies of scale are real. IaaS providers can run better, 
bigger and cheaper than we can and they can still make a profit. This is 
real, not hype or sales BS. (as long as you are honest about your actual 
costs...)


(2) The benefits of "scriptable everything" or "everything has an API". 
I'm so freaking sick of companies installing VMWare and excreting a 
press release calling themselves a "cloud provider". Virtual servers and 
virtual block storage on demand are boring, basic and pedestrian. That 
was clever in 2004. I need far more "glue" to build useful stuff in a 
virtual world and IaaS platforms deliver more products/services and 
"glue" options than anyone else out there. The "scriptable everything" 
nature of IaaS is enabling a lot of cool system and workflow building, 
much of which would be hard or almost impossible to do in-house with 
local resources.



My $.02

-Chris

(corporate hat: chris at bioteam.net)







More information about the Beowulf mailing list