[Beowulf] PetaBytes on a budget, take 2

Fri Jul 22 06:05:52 PDT 2011

On 07/22/11 08:13, Joe Landman wrote:
> On 07/22/2011 01:44 AM, Mark Hahn wrote:
>>>>> Either way, I think if someone were to foolishly just toss together
>>>>>> 100TB of data into a box they would have a hell of a time getting
>>>>> anywhere near even 10% of the theoretical max performance-wise.
>>>>
>>>> storage isn't about performance any more. ok, hyperbole, a little.
>>>> but even a cheap disk does> 100 MB/s, and in all honesty, there are
>>>> not tons of people looking for bandwidth more than a small multiplier
>>>> of that. sure, a QDR fileserver wants more than a couple disks,
>>>
>>> With all due respect, I beg to differ.
>>
>> with which part?
> 
> Mainly "storage isn't about performance any more" and "there are not 
> tons of people looking for bandwidth more than a small multiplier of 
> that".
> 
> To haul out my old joke ... generalizations tend to be incorrect ...

It's pretty nice to wake up in the morning and have somebody else have
said everything nearly exactly as I would have.  Nice write-up Joe!

And at Greg - we can talk semantics until we're blue in the face but the
reality is that Hadoop/HDFS/R3 is just not an appropriate solution for
basic backups, which is the topic of this thread.  Period.  It's a
fabulous tool for actually /working/ on big data and I /really/ do like
Hadoop, but it's a very poor tool when all you want to do is really
high-bw sequential writes or reads.  If you disagree - fine - it's my
opinion and I'm sticking to it.

Regarding trusting your vendor's raid code less than replication code, I
mean, that's pretty obvious.  I think we all can agree cp x 3 is a much
less complex solution.

Best,

ellis