[Beowulf] PetaBytes on a budget, take 2

Mark Hahn hahn at mcmaster.ca
Thu Jul 21 22:44:56 PDT 2011


>>> Either way, I think if someone were to foolishly just toss together
>>>> 100TB of data into a box they would have a hell of a time getting
>>> anywhere near even 10% of the theoretical max performance-wise.
>>
>> storage isn't about performance any more.  ok, hyperbole, a little.
>> but even a cheap disk does>  100 MB/s, and in all honesty, there are
>> not tons of people looking for bandwidth more than a small multiplier
>> of that.  sure, a QDR fileserver wants more than a couple disks,
>
> With all due respect, I beg to differ.

with which part?

> The bigger you make your storage, the larger the pipes in you need, and

hence the QDR comment.

> the larger the pipes to the storage you need, lest you decide that tape
> is really cheaper after all.

people who like tape seem to like it precisely because it's offline.
BB storage, although fairly bottlenecked, is very much online and 
thus constantly integrity-verifiable...

> Tape does 100MB/s these days.  And the media is relatively cheap
> (compared to some HD).

yes, "some" is my favorite weasel word too ;)
I don't follow tape prices much - but LTO looks a little more expensive
than desktop drives.  drives still not cheap.  my guess is that tape could
make sense at very large sizes, with enough tapes to amortize the drives,
and some kind of very large robot.  but really my point was that commodity
capacity and speed covers the vast majority of the market - at least I'm
guessing that most data is stored in systems of under, say 1 PB.

if tape could deliver 135 TB in 4U with 10ms random access, yes,
I guess there wouldn't be any point to backblaze...

> If you don't care about access performance under
> load, you really can't beat its economics.

am I the only one who doesn't trust tape?  who thinks of integrity
being a consequence of constant verifiability?

> More to the point, you need a really balanced architecture in terms of
> bandwidth.  I think USB3 could be very interesting for small arrays, and
> pretty much expect to start seeing some as block targets pretty soon.  I
> don't see enough aggregated USB3 ports together in a single machine to
> make this terribly interesting as a large scale storage medium, but it
> is a possible route.

hard to imagine a sane way to distribute power to lots of external 
usb enclosers, let alone how to mount it.

> They are interesting boxen.  We often ask customers if they'd consider
> non-enterprise drives.  Failure rates similar to the enterprise as it
> turns out, modulo some ridiculous drive products.  Most say no.  Those
> who say yes don't see enhanced failure rates.

old-fashioned thinking, from the days when disks were expensive.
now the most expensive commodity disk you can buy is maybe $200,
so you really have to think of it as a consumable.  (yes, people 
do still buy mercedes and SAS/FC/etc disks, but that doesn't make 
them mass-market/commodity products.)

>> and if you're an iops-head, you're going flash anyway.
>
> This is more recent than you might have guessed ... at least outside of
> academia.  We should have a fun machine to talk about next week, and
> show some benchies on.

to be honest, I don't understand what applications lead to focus on IOPS
(rationally, not just aesthetic/ideologically).  it also seems like
battery-backed ram and logging to disks would deliver the same goods...



More information about the Beowulf mailing list