PVFS fault tolerence ?

Jeff Layton jeffrey.b.layton at lmco.com
Wed Feb 6 02:58:34 PST 2002


Ben Ransom wrote:

> I see that PVFS stripes data, but it seems there is no fault
> tolerance.  That is, a node goes down, and your data is not available.  So,
> although IDE drives are pretty reliable, the chance of data loss from a
> PVFS system is essentially the chance of a single node (disk) going down
> times the number of nodes in the PVFS file space.  Not so good.  So, yeah,
> I buy a large capacity tape drive.  But even then, do I have to look at
> restoring the *entire* PVFS volume, which could be very large (say 20 nodes
> times 30gb) just for loosing one node?  Odds are,  losing a disk is "when",
> not "if".

I can tell you about our experience and then offer some advice wrt PVFS.

We have PVFS across 64 nodes. During 2 years of 24/7, we have lost only
2 drives (during the first 2 months). Then when the system was moved
(down for 4 days) we lost 2 more drives. These are all IDE drives. Not
too bad if you ask me.

Now, my advice. Treat PVFS as a high-speed -temporary- filesystem.
Use it to stream your data and then move the files off of it onto a different
filesystem. That's the way the developers intended the filesystem to be
used.

There are alternatives if you want a distributed filesystem. However, they
are usually very expensive - GFS (not ready for prime time IMHO), and
IBM's distributed FS for Linux (GPFS). In either case you're looking at
SCSI disks in each node and Fibre Channel as well (although I think
GFS can work with just plain NICs).

Good Luck!

Jeff


>
>
> -Ben Ransom
>   UC Davis
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list