[Beowulf] Big storage

Leif Nixon nixon at nsc.liu.se
Fri Sep 14 02:21:14 PDT 2007

Loic Tortay <tortay at cc.in2p3.fr> writes:

> During the last HEPiX meeting, Peter Kelemen mentionned something told 
> to him by a ZFS developer (Jeff Bonwick, if I'm not mistaken) about 
> data corrupted by a Fibre Channel HBA during transfer between disk and 
> host.  ZFS, reportedly, detected (and corrected) the corruption.
> Of course a ZFS developer may be biased.

AFAIU, ZFS is designed specifically to handle such situations, but I'd
like to see large scale tests over a range of different hardware.

> I'm probably mis-remembering some of the technical details about this, 
> since they seem quite unlikely now (something about the laser beam 
> being somehow "corrupted", but I think this would be detected by the 
> Fibre Channel link protocols or upper layers checksums).

Yeah, I guess it should. But we recently lost 11 TB data due to a FC
switch port silently trashing a small proportion of the data passing
through it. (Quite possibly ZFS would have saved us.) And I've seen
three similiar incidents at other places in the last few months. So I
have turned up my cynicism knob yet a few notches.

