[Beowulf] MD check/scrub
Bill Broadley
bill at cse.ucdavis.edu
Tue Nov 13 10:03:22 PST 2007
Leif Nixon wrote:
> Reconstruction. With raid 6, you can recover from single-disk
> corruption (As opposed to *failures*, where you get read errors from a
> disk. Raid 6 can handle two simultaneous disk *failures*.).
>
> See section 4 in:
>
> http://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
>
I just read it.
> Just recalculating the parity blocks does give you a consistent raid
> stripe, but destroys your data (unless it actually was one of the
> parity blocks that was corrupted).
Er, that's not how I read it at all. To quote:
In the case of data drive corruption, once the faulty drive has been
identified, recover using the P drive in the same way as a one-disk erasure
failure.
So you want to catch these single disk corruptions (data or parity) as soon
as possible so they don't accumulate. In general if you have the redundancy
at the software RAID it seems best not push too hard on the individual drive.
Don't retry excessively (and depend on the per block checksums) or allow long
timeouts. As soon as the error hits do a write (to remap the block), after
all do you trust a drive to read the sector on the 10th time more than you
trust your parity calculations? If the driver error rates gets too high drop
the drive like a hot potato and scream bloody murder so the admin feeds you a
disk asap.
More information about the Beowulf
mailing list