[Beowulf] Re: dedupe filesystem

Gerry Creager gerry.creager at tamu.edu
Mon Jun 29 06:57:19 PDT 2009


Dave Love wrote:
> Ashley Pittman <ashley at pittman.co.uk> writes:
> 
>> If you relied on the md5 sum alone there would be collisions and those
>> collisions would result in you losing data.
> 
> The question is whether the probability of collisions is high compared
> with other causes -- presumably hardware, assuming no-one puts figures
> on the software reliability.  As far as I remember, the calculation for
> SHA-1 for Plan 9's Venti¹, which no-one seems to have mentioned, says
> ignore collisions for petabyte filesystems.
> 
> Ob-Beowulf:  You can run Venti on GNU/Linux,² but I don't know how the
> current implementation performs.  Also, GlusterFS has a `data
> de-duplication translator' on its roadmap, which I didn't see mentioned.

Our initial results with a GlusterFS implementation led us back to NFS. 
  Who's got a really successful GlusterFS implementation working?

> --
> 1. http://plan9.bell-labs.com/sys/doc/venti/venti.html
> 2. http://swtch.com/plan9port/
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843



More information about the Beowulf mailing list