[Beowulf] PetaBytes on a budget, take 2

Ellis H. Wilson III ellis at runnersroll.com
Thu Jul 21 17:03:58 PDT 2011


On 07/21/11 18:07, Greg Lindahl wrote:
> On Thu, Jul 21, 2011 at 02:55:30PM -0400, Ellis H. Wilson III wrote:
>> My personal experience with getting large amounts of data from local
>> storage to HDFS has been suboptimal compared to something more raw,
> 
> If you're writing 3 copies of everything on 3 different nodes, then
> sure, it's a lot slower than writing 1 copy. The benefit you get from
> this extra up-front expense is resilience.

Used in a backup solution, triplication won't get you much more
resilience than RAID6 but will pay a much greater performance penalty to
simply get your backup or checkpoint completed.  Additionally, unless
you have a ton of these boxes you won't get some of the important
benefits of Hadoop such as rack-aware replication placement.  Perhaps
you could alter HDFS to handle triplication in the background once you
get the local copy on-disk, but this isn't really what it was built for
so again one is probably better off going with a more efficient, if less
complex distributed file system.

ellis



More information about the Beowulf mailing list