[Beowulf] Re: failure trends in a large disk drive population (google fileing system)
momentics
momentics at gmail.com
Mon Feb 19 01:00:26 PST 2007
On 2/19/07, matt jones <jamesjamiejones at aol.com> wrote:
> if one fails there
> are still 3, if another there are still 2. i've also read somewhere else
> that if one fails, it can automatically recreate the image from the
> remaining ones on a spare node.
[...]
>this approach is rather ott, but it works and works well.
not sure of Google gents; but we're using reliability model to
calculate number of nodes and their physical locations (continuous
scheduling) - to meet the expected reliability coefficient specified
by the system operator/deployer/configurator (for EE, SW and HW
parts).
HDD is unreliable system part, with the nearly known reliability
(expected -actually), moreover, as we know, most of HDDs have SMART
metrics - the good way to correct live coefficients within used math
model. The outcome here is to use adaptive techs.
So Googles are using the same way probably - a good company anyhow... ta-da! :)
Scal at Grid – http://sgrid.sourceforge.net/
//
(the perfect doc - the amazing work)
More information about the Beowulf
mailing list