[Beowulf] Re: failure trends in a large disk drive population (google fileing system)
matt jones
jamesjamiejones at aol.com
Sun Feb 18 13:49:47 PST 2007
i've read in the past somewhere that the Google File System is capable
of having many copies of the data. often having 4 copies on different
nodes. and as you say run the query to many of them. if one fails there
are still 3, if another there are still 2. i've also read somewhere else
that if one fails, it can automatically recreate the image from the
remaining ones on a spare node. bringing it back to 4. this approach is
rather ott, but it works and works well.
i suspect this sort of thing could be done cheaper by just using 3 per
copy and hoping that you never lose 2 or more nodes at once.
essentially this is a huge distributed files system with integrated RAID
software.
Chris Samuel wrote:
> IIRC they also have figured out a way to be fault tolerant by sending
queries out to multiple systems for each part of the DB they are
querying, so if one of those fails others will respond anyway.
>
> Apparently they use more reliable hardware for things like the
advertising service
--
matt.
More information about the Beowulf
mailing list