[Beowulf] PetaBytes on a budget, take 2
Ellis H. Wilson III
ellis at runnersroll.com
Thu Jul 21 17:03:58 PDT 2011
On 07/21/11 18:07, Greg Lindahl wrote:
> On Thu, Jul 21, 2011 at 02:55:30PM -0400, Ellis H. Wilson III wrote:
>> My personal experience with getting large amounts of data from local
>> storage to HDFS has been suboptimal compared to something more raw,
>
> If you're writing 3 copies of everything on 3 different nodes, then
> sure, it's a lot slower than writing 1 copy. The benefit you get from
> this extra up-front expense is resilience.
Used in a backup solution, triplication won't get you much more
resilience than RAID6 but will pay a much greater performance penalty to
simply get your backup or checkpoint completed. Additionally, unless
you have a ton of these boxes you won't get some of the important
benefits of Hadoop such as rack-aware replication placement. Perhaps
you could alter HDFS to handle triplication in the background once you
get the local copy on-disk, but this isn't really what it was built for
so again one is probably better off going with a more efficient, if less
complex distributed file system.
ellis
More information about the Beowulf
mailing list