[Beowulf] dedupe filesystem
Greg Lindahl
lindahl at pbm.com
Mon Jun 8 13:52:48 PDT 2009
>> It might be worth noting that dedup is not intended for high
>> performance file systems ... the cost of computing the hash(es)
>> is(are) huge.
>
> Some file systems do (or claim to do) checksumming for data integrity
> purposes, this seems to me like the perfect place to add the computation
> of a hash - with data in cache (needed for checksumming anyay), the
> computation should be fast.
Filesystems may call it a "checksum" but it's usually a hash. We use a
Jenkins hash, which is fast and a lot better than, say, the TCP
checksum. But it's a lot weaker than an expensive hash.
If your dedup is going to fall back to byte-by-byte comparisons, it could
be that a weak hash would be good enough.
-- greg
More information about the Beowulf
mailing list