[Beowulf] PetaBytes on a budget, take 2

Joe Landman landman at scalableinformatics.com
Fri Jul 22 05:13:30 PDT 2011

On 07/22/2011 01:44 AM, Mark Hahn wrote:
>>>> Either way, I think if someone were to foolishly just toss together
>>>>> 100TB of data into a box they would have a hell of a time getting
>>>> anywhere near even 10% of the theoretical max performance-wise.
>>> storage isn't about performance any more. ok, hyperbole, a little.
>>> but even a cheap disk does> 100 MB/s, and in all honesty, there are
>>> not tons of people looking for bandwidth more than a small multiplier
>>> of that. sure, a QDR fileserver wants more than a couple disks,
>> With all due respect, I beg to differ.
> with which part?

Mainly "storage isn't about performance any more" and "there are not 
tons of people looking for bandwidth more than a small multiplier of 

To haul out my old joke ... generalizations tend to be incorrect ...

>> The bigger you make your storage, the larger the pipes in you need, and
> hence the QDR comment.

Yeah, well not everyone likes IB.  As much as we've tried to convince 
others that it is a good idea in some cases for their workloads, many 
customers still prefer 10GbE and GbE.  I personally have to admit that 
10GbE and its very simple driver model (and its "just works" concept) is 
incredibly attractive, and often far easier to support than IB.

This said, we've seen/experienced some very bad IB implementations 
(board level, driver issues, switch issues, overall fabric, ...) that I 
am somewhat more jaded as to real achievable bandwidth with it these 
days than I've been in the past.

Sorta like people throwing disks together into 6G backplanes.  We run 
into this all the time in certain circles.  People tend to think the 
nice label will automatically grant them more performance than before. 
So we see some of the most poorly designed ... hideous really ... 
designed units from a bandwidth/latency perspective.

I guess what I am saying is that QDR (nor 10GbE) is a silver bullet. 
There are no silver bullets.  You *still* have to start with balanced 
and reasonable designs to get a chance at good performance.

>> the larger the pipes to the storage you need, lest you decide that tape
>> is really cheaper after all.
> people who like tape seem to like it precisely because it's offline.
> BB storage, although fairly bottlenecked, is very much online and thus
> constantly integrity-verifiable...

Extremely bottlenecked.  100TB / 100 MB/s -> 100,000,000 MB / 100 MB/s = 
1,000,000 s to read or write ... once.  This is what we've been calling 
the storage bandwidth wall.  The higher the wall, the colder and more 
inaccessible your data is.   This is on the order of 12 days to read or 
write the data once.

My point about these units is that it may be possible to expand the 
capacity so much (without growing the various internal bandwidths) that 
it becomes effectively impossible to utilize all the space, even a 
majority of the space, in a reasonable time.  Which renders the utility 
of such devices moot.

>> Tape does 100MB/s these days. And the media is relatively cheap
>> (compared to some HD).
> yes, "some" is my favorite weasel word too ;)

Well ... if you're backing up to SSD drives ...  No, seriously not 
weasel wording on this.  Tape is relatively cheap in bulk for larger 

> I don't follow tape prices much - but LTO looks a little more expensive
> than desktop drives. drives still not cheap. my guess is that tape could
> make sense at very large sizes, with enough tapes to amortize the drives,
> and some kind of very large robot. but really my point was that commodity
> capacity and speed covers the vast majority of the market - at least I'm
> guessing that most data is stored in systems of under, say 1 PB.

Understand also that I share your view that commodity drives are the 
better option.  Just pointing out that you can follow your asymptote to 
an extreme (tape) if you wish to keep pushing pricing per byte down.

My biggest argument against tape is, that, while the tapes themselves 
may last 20 years or so ... the drives don't.  I've had numerous direct 
experiences with drive failures that wound up resulting in inaccessible 
data.  I fail to see how the longevity of the media matters in this 
case, if you can't read it, or cannot get replacement drives to read it. 
  Yeah, that happened.


> am I the only one who doesn't trust tape? who thinks of integrity
> being a consequence of constant verifiability?

See above.


>> They are interesting boxen. We often ask customers if they'd consider
>> non-enterprise drives. Failure rates similar to the enterprise as it
>> turns out, modulo some ridiculous drive products. Most say no. Those
>> who say yes don't see enhanced failure rates.
> old-fashioned thinking, from the days when disks were expensive.
> now the most expensive commodity disk you can buy is maybe $200,
> so you really have to think of it as a consumable. (yes, people do still
> buy mercedes and SAS/FC/etc disks, but that doesn't make them
> mass-market/commodity products.)

heh ... I can see it now:

Me: "But gee Mr/Ms Customer, thats really old fashioned thinking (and 
Mark told me so!) so you gots ta let me sell you dis cheaper disk ..."

(watches as door closes in face)

It will take time for the business consumer side of market to adapt and 
adopt.  Some do, most don't.  Aside from that, the drive manufacturers 
just love them margins on the enterprise units ...

And do you see how willingly people pay large multiples of $1/GB for 
SSDs?  Ok, they are now getting closer to $1/GB, but thats still more 
than 1 OOM worse in cost than spinning rust ...

>>> and if you're an iops-head, you're going flash anyway.
>> This is more recent than you might have guessed ... at least outside of
>> academia. We should have a fun machine to talk about next week, and
>> show some benchies on.
> to be honest, I don't understand what applications lead to focus on IOPS
> (rationally, not just aesthetic/ideologically). it also seems like
> battery-backed ram and logging to disks would deliver the same goods...

oh... many.  RAM is expensive.  10TB ram is power hungry and very 
expensive.  Bloody fast, but very expensive.  Many apps want fast and cheap.

As to your thesis, in the world we live in today, bandwidth and latency 
are becoming ever more important, not less important.  Maybe for 
specific users this isn't the case, and BB is perfect for that use case. 
  For the general case, we aren't getting people asking us if we can 
raise that storage bandwidth wall.  They are all asking us to lower that 

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list