[Beowulf] Station wagon full of tapes
Greg Keller
Greg at keller.net
Tue May 26 11:10:04 PDT 2009
On May 26, 2009, at 10:20 AM, "Robert G. Brown" <rgb at phy.duke.edu>
> Subject: Re: [Beowulf] Station wagon full of tapes
> To: Jeff Layton <laytonjb at att.net>
> Cc: Beowulf Mailing List <beowulf at beowulf.org>
> Message-ID: <alpine.LFD.2.00.0905260946000.4021 at localhost.localdomain>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Tue, 26 May 2009, Jeff Layton wrote:
>
>> I haven't seen the cloud ready yet for anything other than
>> embarrassingly
>> parallel codes (i.e. since node, small IO requirements). Has anyone
>> seen
>> differently? (as an example of what might work, CloudBurst seems to
>> be
>> gaining some traction - doing sequencing in the cloud. The only
>> problem
>> is that sequencing can generate a great deal of data pretty rapidly).
>
> I'm pretty skeptical of commercial rent-a-cluster business models.
To me "Cloud" implies "embarrassingly parallel" infrastructure. High
speed interconnects (which I would loosely define as "better than std
GbE" are not cloud friendly... and really require a "Cluster for Hire"
arrangement. Also, there's no "throat to choke" in the clouds, but my
customers know how to find me to let me know when something isn't fair
or could be done better for their specific use case.
It's actually a great business when you have the right mix of clients
and expectations. IRS depreciation rules make it tough to survive...
but this is a double edged sword since depreciation rules drive some
of the bleeding edge users to external systems so they can stay
bleeding edge every year without internal accounting fights.
> Is this true in cluster computing? Nearly every cluster computer
> user I
> know of wants "infinite capacity", not finite capacity. They are
> limited by their budget, not their needs. Give them more capacity,
> they'll scale up their computation and finish it faster or do it
> bigger
> or both.
This is where the "value" of a 2nd or 3rd year cluster really comes
through for "value" researchers.... would they rather spend their
money on a few systems for a long run, or a lot of systems for a short
run? Since many researchers are "project" based and funded, access to
a large system for short bursts of time could help a single researcher
work through more "projects" or ideas faster than on his own simple
cluster even if the "Per hour" charge is more... they are able to
accomplish more with their time and work through more brilliant
ideas... We can work out the "time value" of money, but what is the
"time value" of a brilliant mind waiting for an answer? This is the
reason Departmental or Enterprise class HPC systems *should* be the
minimum scale an organization builds in house. Some commercial ISV's
will often donate the temporary licenses to help test scaling limits
for their customers before a big purchase is made.
> Plus edge cases -- somebody that needs a cluster desperately, but only
> for six months and to do one single computation (how common is that?
> not very, I think).
Six months is a long time. We routinely see requests for hundreds of
compute cores in the < 1 Month range. They aren't "edge" or
"desperate", they just understand the "inertia" of their internal
systems group, vendors etc. and "peak shaving" is the best way to keep
their in house system from being crushed under the weight of the
queues.... while justifying longer term increases in capacity in
house. Outsourced computing for this shouldn't be thought of as "all
or nothing", it's just an added tool for the Researcher and his IT
staff to manage when it fits their needs.
The researcher who needs the extra cycles may not be the one who is
ultimately computing off site either.... if they have peers on the
internal system that can more easily deal with the limits of network
bandwidth then bartering can ensue and help everyone get more done
without blowing the budget or wasting a lot of wall clock time.
Ultimately Fedex is the highest bandwidth network, with terrible
latency. We have rarely had to resort to this "fall back" network so
far but are glad to know it is there and simple to manage.
In the end, it's all about Balancing the Bleeding edge user's needs
with the Value edge user's needs. If we can load the teeter totter up
with enough of each, much more research will be accomplished at a
better overall value for all the researchers. The business of selling
cycles on shared systems has been around since the first "virtual
machines" hit the mainframe market long ago. It's been called many
things, and funded many ways, but in the end scaling up to a specific
plateau is more efficient assuming the machine stays reasonably loaded
over it's useful life.
Researchers able to suffer occasionally long availability delays and
"preemption" will find some really good deals backfilling our
systems. This way we can have our utilization and our availability
(for unexpected jobs) too. Some day it should be considered an
ethical and ecological crime to have unused cycles wasted during the
short life of a research computer. It should be an implicit goal for
every researcher who controls a system to keep it doing useful work
for themselves or their peers.
Full Disclosure: I am a part owner of a "cluster for hire" business
trying to assemble the "wasted" cycles under so many desks to benefit
all the researchers who won't miss the wirr and buzz under there
desks... Heck, we'll even put our nodes under their desk if that's
what they need us to do :)
Sincerely,
Greg W. Keller
R Systems NA, inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20090526/d3191ada/attachment.html>
More information about the Beowulf
mailing list