[Beowulf] Rsync - checksums

Benjamin Redling benjamin.rampe at uni-jena.de
Mon Jun 17 16:18:12 PDT 2019


You mean like a COW filesystem with end-to-end checksums were you can
send snapshots and don't care to much about MD5?

I looked it up. Spectrum Scale fka. GPFS has end-to-end checksums,
(global) snapshots and mmapplypolicy to get the list of files to backup
-- at least Commvault according to their documentation is leveraging it
to get the changed files.

Know I wonder were that "theory" doesn't match practice...

Over and out.

On 17.06.19 16:39, Michael Di Domenico wrote:
> rsync on 10PB sounds painful.  i haven't used GPFS in a very long
> time, so i might have a gap in knowledge.  but i would be surprised if
> GPFS doesn't have a changelog, where you can watch the files that
> changed through the day and only copy the ones that did?  much like
> what robinhood does for lustre.
> 
> On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser <bill at princeton.edu> wrote:
>>
>> We have moved to a rsync disk backup system, from TSM tape, in order to
>> have a DR for our 10 PB GPFS filesystem.  We looked at a lot of options
>> but here we are.
>>
>> md5 checksums take a lot of compute time with huge files and even with
>> millions of smaller ones.  The bulk of the time for running rsync is
>> spent in computing the source and destination checksums and we'd like to
>> alleviate that pain of a cryptographic algorithm.
>>
>> Googling around, I found no mention of using a technique like this to
>> improve rsync performance.  I did find reference to a few hashing
>> algorithms though which could certainly work here (xxhash, murmurhash,
>> sbox, cityhash64).
>>
>> Rsync has certainly been around for a few years!  We are going to pursue
>> changing the current checksum algorithm and using something much faster.
>>   If anyone has done this already and would like to share their
>> experiences that would be wonderful. Ideally this could be some optional
>> plugin for rsync where users could choose which checksummer to use.
>>
>> Bill
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> 


-- 
FSU Jena | JULIELab.de/Staff/Redling
☎ +49 3641 9 44323


More information about the Beowulf mailing list