[Beowulf] cloud: ho hum?

Joe Landman landman at scalableinformatics.com
Wed Feb 1 07:45:42 PST 2012


On 02/01/2012 10:08 AM, Mark Hahn wrote:
> in hopes of leaving the moderation discussion behind,
> here's a more interesting topic: cloud wrt beowulf/hpc.
>
> when I meet cloud-enthused people, normally I just explain how
> HPC clustering has been doing PaaS cloud all along.  there are some
> people who run with it though: bioinformatics people mostly, who
> take personal affront to the concept of their jobs being queued.

Heh ... to put it mildly, this subset of HPC users tend to be more prone 
to fads a fair number of others.  As often as not, we have to work to 
solve the real problem in part by helping to unmask the real problem 
(and move past the perceptions of what some CS person told them the 
problems were).

> (they don't seem to understand that queueing is a function of how
> efficiently utilized a cluster is, and since a cloud is indeed a
> cluster, you get queueing in a cloud as well.)

Sort of, but the illusion in a cloud is, that its all theirs, regardless 
of whether or not its emulated/virtualized/bare metal.

>
> part of the issue here seems to be that people buy into a couple
> fallacies that they apply to cloud:
>   	- private sector is inherently more efficient.  this is a bit
>   	of a mystery to me, but I guess this is one of the great rhetorical
>   	successes of the neocon movement.  I've looked at Amazon prices,

I'll ignore the obvious (and profoundly incorrect) political stance 
here, and focus upon the (failed) economic argument.  Yes, the 
competitive private sector is *always* more efficient at delivering 
goods and services than the non-competitive government sector.  The only 
time the private sector is less efficient is when there is no meaningful 
competition, then the consumers of a good or service will pay market 
pricing set, not by competitive forces, but by the preference of the 
dominant vendor which does not need to compete to win the business.

For example, in desktop software environments, for the better part of 20 
years, Microsoft has been the dominant player, and has had complete 
freedom to set whatever pricing it wishes.  Now that it faces 
competitive pressure on several fronts, you are seeing pricing starting 
to react accordingly to market forces.

Economics 101 applies:  Competitive market forces enable efficient 
markets.  Non-competitive market forces don't.

>   	and they are remarkably high - depending on purchasing model,
>   	about 20x higher than an academic-run research cluster.  why is there

Hmmm ... I don't think you are taking everything into account, and more 
to the point, you are not comparing apples to oranges.  Compare Amazon 
to CRL to Joyent to Sabalcore to ... .  You will find competitive 
pricing among these for similar use cases.  In all cases, your up front 
costs and recurring costs are capped.  You want to use 10k nodes for 1 
hour, you can.  And it won't cost you 10k nodes of capital + 
infrastructure, power, cooling, ... to make it happen.  You want 10k 
nodes for one hour at an academic site?  Get in line, and someone has to 
have laid out the capex for all of this.  Just because you don't see 
this direct cost, or the chargeback to you as an end user doesn't 
reflect a cost recovery and a profit (latter being irrelevant for most 
academic sites) doesn't mean it "costs 1/20 as much".  It means you 
haven't accounted for the real costs correctly.

>   	not more skepticism of outsourcing, since it always means your cost
>   	includes one or more corporate profit margins?

... and is corporate profit a bad thing?  Seriously?

There is a cost associated with you not taking the capital charge for 
the systems you use, or for the OPEX of using them.  Or for the other 
indirect costs surrounding the rest of this.  You are paying for the 
privilege of keeping your costs down.  So, for an academic user that has 
to obtain 10k CPU hours on 1000 CPUs, in order to solve their problem, 
they can a) sign on to and get a grant for SHARCNET and others, which 
involve some sort of charge back mechanism (cause SHARCNET and others 
have to pay for their power, cooling, data, people)  b) build their own 
cluster (which makes sense only if you do many runs), c) buy it from 
Amazon/CRL/Sabalcore/... and only pay for what they use and start 
running right away.

So which one makes the most sense?  Rhetorical question to a degree as 
it depends strongly upon the use case, the grant needs, etc.

>
>   	- economies of scale: people seem to think that a datacenter at the
>   	scale of google/amazon/facebook is going to be dramatically cheaper.

It generally is.

>   	while I'm sure they get a good deal from their suppliers, I also
>   	doubt it's game-changing.  power, for instance, is a relatively
>   	modest portion of costs, ~10% per year of a server's purchase price.

Then why do Google et al colocate their data centers near cheap power if 
power is only a modest/minute fraction of the total cost?  TCO matters, 
and if you have to pay for power 24x7 during the life of the system, you 
want to minimize this cost.  Multiple the cost of power for 1 server by 
100k, add in other bits, and this modest fraction starts adding up to 
significant amounts (and fractions of the total cost), very quickly.  It 
can be game changing.  Which is why they locate their data centers where 
there is an optimin (minimizing total lifetime costs of power, taxes, 
etc.) as compared with the nearby data center where you pay a premium 
for convenience.

>   	machineroom cost is pretty linear with number of nodes (power);
>   	people overhead is very small (say,>  1000 servers per fte.)
>
> most of all, I just don't see how cloud changes the HPC picture at all.
> HPC is already based on shared resources handling burstiness of demand -

Not all HPC is this way.  Actually most isn't.

> if anything, cloud is simply slower.  certainly I can't submit a job to
> EC2 that uses half the Virgina zone and expect it to run immediately.
> it's not clear to me whether cloud-pushers are getting real traction with
> the funding agencies (gov is neocon here in Canada.)  it worries me that
> cloud might be framed as "better computing than HPC".

Hmmm.

>
> I'm curious: what kind of cloudiness are you seeing?

Quite a bit.  People are looking at clouds for private use with trivial 
extension to public usage for computing.  We are seeing huge amounts of 
private storage cloud builds.

Cloud is ASP v3 (or v4 if you count clusters).  In ASPs, large external 
high cost gear was centralized.  Economics simply didn't work for it and 
this model died.  Clusters started around then.  Grid/Utility Computing 
started around then, and Amazon launched their offering at the notional 
end of this market.  Grid was largely a bust from a commercial view, as 
it again had bad economics.  Clusters were in full blossom then. 
Economics favored them.  If you like to look at Clusters as ASP v3, you 
can, though they've been running along side of the fads.  Cloud is ASP 
v3 or v4 (if you say clusters were v3).  Natural evolution of taking a 
cluster, putting a VM on demand on it, or running something bare metal 
on it.  Where its located matters to a degree, and data motion is still 
the hardest problem, and its getting harder.  This is why private data 
clouds (and computing clouds) are getting more popular.


This said, like all other fads/trends, Cloud is (massively over-)hyped. 
  It has value, it has staying power (unlike grid, ASP, ...).  It solves 
a specific set of problems, and does so well, and you pay a premium for 
solving those set of problems in that manner.  We see more folks 
building private clouds (e.g. clusters with more intelligent 
allocation/provisioning) than we do see people run exclusively on the cloud.

In financial services, we've had customers tell us how wonderful it was 
(from a convenience view) and how awful it was (from a performance 
view).  It matters more to people who care about getting cycles than for 
people who care about getting really good single CPU performance.  Cloud 
is a throughput engine, and this mode of operation is becoming more 
important over time.  Even in HPC.  Especially with BigData (hey, wanna 
talk about a massively over-hyped term?  There's one for ya ... they 
hype masks the real issues, and this is a shame, but such is life).

And for what its worth, VC's are positively throwing money at cloud/big 
data companies.  This doesn't make it better.  Probably worse.  But 
thats a whole other discussion.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615




More information about the Beowulf mailing list