Large FOSS filesystems, was Re: [Beowulf] 512 nodes Myrinet cluster Challanges
David Kewley
kewley at gps.caltech.edu
Sun May 7 11:43:17 PDT 2006
On Friday 05 May 2006 11:36, Craig Tierney wrote:
> My concern with this setup isn't xfs, it would be the stability of
> the storage. Also, if there is a disk hiccup (which will happen) that
> repairing a 16 TB filesystem takes a long time. A distributed
> filesystem (PVFS2, Ibrix, etc) you would only have to fix the one
> volume, not the entire filesystem. There may be some filesystem
> consistency checks after repair, but not to the extent of a full
> filesystem check.
We have a single 35TB Ibrix filesystem, served by 16 fileservers and backed
by 64 SAN LUNs on a DataDirect Networks storage array. The fsck protocol
today is to do a full filesystem check first, and then do fixes if
necessary. The LUN filesystems are modified ext3, so the "Phase I" fsck is
64 ext3 fsck's in parallel. The check-only Phase I run takes quite a while
(ext3 fsck is fairly slow). Once the damaged LUN filesystems are
identified, the repairs can optionally be restricted to the damaged LUNs;
fewer LUNs being accessed in parallel means that the repair run can
actually go much faster than the check-only run. Usually a post-repair
consistency check is not necessary (Ibrix tech support advises us what to
do in each case, depending on what the logs show). There are two more fsck
phases that are run separately; the second phase is somewhat faster than
the first, and the third is very fast.
I'll leave out any further details of the Ibrix filesystem architecture and
fsck, since I'm not entirely clear how much they want to keep private in
their conversations with their supported or pre-sales customers. You can
talk to them yourself. :)
David
More information about the Beowulf
mailing list