[Beowulf] $1, 279-per-hour, 30, 000-core cluster built on Amazon EC2 cloud
Chris Dagdigian
dag at sonsorol.org
Tue Oct 4 12:29:28 PDT 2011
I'm largely with RGB on this one with the minor caveat that I think he
might be undervaluing the insane economies of scale that IaaS providers
like Amazon & Google can provide.
At the scale that Amazon operates at, they can obtain and run
infrastructure far, far more efficiently than most (if not all) of us
can ourselves. These folks have exabytes of spinning disk, redundant
data-centers (with insane PUE values) all over the world and they know
how to manage hundreds of thousands of servers with high efficiency in a
very hostile networking environment. Not only can they run bigger and
more efficient than we can, they can charge a price that makes them a
profit while still being (in many cases) far cheaper than my own costs
should I be truly honest about the fully-loaded costs of maintaining HPC
or IT services.
AWS has a history of lowering prices as their own costs go down. You can
see this via the EC2 pricing history as well as the now-down-to-zero
cost of inbound data transit.
AWS Spot market makes this even more interesting. I can currently run an
m1.4xlarge 64bit server instance with 15GB RAM for about $.24 per hour -
close to 50% cheaper than the published hourly price and that spot price
can hold steady for weeks at a time in many cases.
The biggest hangup is the economics. Even harder in an academic
environment where researchers are used to seeing their funds vanish to
"overhead" on their grant or they just assume that datacenters,
bandwidth, power and hosting are all "free" to use.
It's hard to do true cost comparisons but time and time again I've seen
IaaS come out ahead when the fully-loaded costs are actually put down on
paper.
Here is a cliche example: Amazon S3
Before the S3 object storage service will even *acknowledge* a
successful PUT request, your file is already at rest in at least three
amazon facilities.
So to "really" compare S3 against what you can do locally you at least
have to factor in the cost of your organization being able to provide 3x
multi-facility replication for whatever object store you choose to deploy...
I don't want to be seen as a shill so I'll stop with that example. The
results really are surprising once you start down the "true cost of IT
services..." road.
As for industry trends with HPC and IaaS ...
I can assure you that in the super practical & cynical world of biotech
and pharma there is already an HPC migration to IaaS platforms that is
years old already. It's a lot easier to see where and how your money is
being spent inside a biotech startup or pharma and that is (and has)
shunted a decent amount of spending towards cloud platforms.
The easy stuff is moving to IaaS platforms. The hard stuff, the custom
stuff, the tightly bound stuff and the data/IO-bound stuff is staying
local of course - but that still means lots of stuff is moving externally.
The article that prompted this thread is a great example of this. The
client company had a boatload of one-off molecular dynamics simulations
to run. So much, in fact, that the problem was computationally
infeasable to even consider doing inhouse.
So they did it on AWS.
30,000 CPU cores. For ~$9,000 dollars.
Amazing.
It's a fun time to be in HPC actually. And getting my head around "IaaS"
platforms turned me onto things (like opscode chef) that we are now
bringing inhouse and integrating into our legacy clusters and grids.
Sorry for rambling but I think there are 2 main drivers behind what I
see moving HPC users and applications into IaaS cloud platforms ...
(1) The economies of scale are real. IaaS providers can run better,
bigger and cheaper than we can and they can still make a profit. This is
real, not hype or sales BS. (as long as you are honest about your actual
costs...)
(2) The benefits of "scriptable everything" or "everything has an API".
I'm so freaking sick of companies installing VMWare and excreting a
press release calling themselves a "cloud provider". Virtual servers and
virtual block storage on demand are boring, basic and pedestrian. That
was clever in 2004. I need far more "glue" to build useful stuff in a
virtual world and IaaS platforms deliver more products/services and
"glue" options than anyone else out there. The "scriptable everything"
nature of IaaS is enabling a lot of cool system and workflow building,
much of which would be hard or almost impossible to do in-house with
local resources.
My $.02
-Chris
(corporate hat: chris at bioteam.net)
More information about the Beowulf
mailing list