[Beowulf] Rsync - checksums

Stu Midgley sdm900 at gmail.com
Mon Jun 17 23:30:55 PDT 2019


I use xxhash https://github.com/Cyan4973/xxHash to do hashes...  much
faster.

On Tue, Jun 18, 2019 at 7:18 AM Benjamin Redling <benjamin.rampe at uni-jena.de>
wrote:

> You mean like a COW filesystem with end-to-end checksums were you can
> send snapshots and don't care to much about MD5?
>
> I looked it up. Spectrum Scale fka. GPFS has end-to-end checksums,
> (global) snapshots and mmapplypolicy to get the list of files to backup
> -- at least Commvault according to their documentation is leveraging it
> to get the changed files.
>
> Know I wonder were that "theory" doesn't match practice...
>
> Over and out.
>
> On 17.06.19 16:39, Michael Di Domenico wrote:
> > rsync on 10PB sounds painful.  i haven't used GPFS in a very long
> > time, so i might have a gap in knowledge.  but i would be surprised if
> > GPFS doesn't have a changelog, where you can watch the files that
> > changed through the day and only copy the ones that did?  much like
> > what robinhood does for lustre.
> >
> > On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser <bill at princeton.edu> wrote:
> >>
> >> We have moved to a rsync disk backup system, from TSM tape, in order to
> >> have a DR for our 10 PB GPFS filesystem.  We looked at a lot of options
> >> but here we are.
> >>
> >> md5 checksums take a lot of compute time with huge files and even with
> >> millions of smaller ones.  The bulk of the time for running rsync is
> >> spent in computing the source and destination checksums and we'd like to
> >> alleviate that pain of a cryptographic algorithm.
> >>
> >> Googling around, I found no mention of using a technique like this to
> >> improve rsync performance.  I did find reference to a few hashing
> >> algorithms though which could certainly work here (xxhash, murmurhash,
> >> sbox, cityhash64).
> >>
> >> Rsync has certainly been around for a few years!  We are going to pursue
> >> changing the current checksum algorithm and using something much faster.
> >>   If anyone has done this already and would like to share their
> >> experiences that would be wonderful. Ideally this could be some optional
> >> plugin for rsync where users could choose which checksummer to use.
> >>
> >> Bill
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
>
>
> --
> FSU Jena | JULIELab.de/Staff/Redling
> ☎ +49 3641 9 44323
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>


-- 
Dr Stuart Midgley
sdm900 at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190618/d2090b30/attachment.html>


More information about the Beowulf mailing list