[Beowulf] Rsync - checksums

Stu Midgley sdm900 at gmail.com
Wed Oct 2 08:33:39 PDT 2019


thankyou

On Tue, Oct 1, 2019 at 9:26 PM Bill Wichser <bill at princeton.edu> wrote:

> I used xxHash-0.7.0 to build against.  You'll need to grab a version and
> install.  For the actual rsync I have a diff, xxhash.patch along with
> the rpms for rsync in
>
> https://tigress-web.princeton.edu/~bill/
>
> If I get time I'll try and pass this to the upstream rsync folks.  It is
> performing about the same speed as using --checksum so we are happy.
> This has been in production and seems to work fine.
>
> Bill
>
> On 9/30/19 8:55 PM, Stu Midgley wrote:
> > That's pretty awesome, are you going to make it available?  or push it
> > upstream?
> >
> > If not... how can we get it?
> >
> > On Tue, Oct 1, 2019 at 1:09 AM Bill Wichser <bill at princeton.edu
> > <mailto:bill at princeton.edu>> wrote:
> >
> >     Just wanted to circle back on my orginal question.  I changed the
> rsync
> >     code adding xxhash and we see about a 3x speedup.  Good enough since
> it
> >     is very close to not using any checksum speedups.
> >
> >     Bill
> >
> >     On 6/17/19 9:43 AM, Bill Wichser wrote:
> >      > We have moved to a rsync disk backup system, from TSM tape, in
> >     order to
> >      > have a DR for our 10 PB GPFS filesystem.  We looked at a lot of
> >     options
> >      > but here we are.
> >      >
> >      > md5 checksums take a lot of compute time with huge files and even
> >     with
> >      > millions of smaller ones.  The bulk of the time for running rsync
> is
> >      > spent in computing the source and destination checksums and we'd
> >     like to
> >      > alleviate that pain of a cryptographic algorithm.
> >      >
> >      > Googling around, I found no mention of using a technique like
> >     this to
> >      > improve rsync performance.  I did find reference to a few hashing
> >      > algorithms though which could certainly work here (xxhash,
> >     murmurhash,
> >      > sbox, cityhash64).
> >      >
> >      > Rsync has certainly been around for a few years!  We are going to
> >     pursue
> >      > changing the current checksum algorithm and using something much
> >     faster.
> >      >   If anyone has done this already and would like to share their
> >      > experiences that would be wonderful. Ideally this could be some
> >     optional
> >      > plugin for rsync where users could choose which checksummer to
> use.
> >      >
> >      > Bill
> >      > _______________________________________________
> >      > Beowulf mailing list, Beowulf at beowulf.org
> >     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> >      > To change your subscription (digest mode or unsubscribe) visit
> >      > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >     _______________________________________________
> >     Beowulf mailing list, Beowulf at beowulf.org
> >     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> >     To change your subscription (digest mode or unsubscribe) visit
> >     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
> >
> >
> > --
> > Dr Stuart Midgley
> > sdm900 at gmail.com <mailto:sdm900 at gmail.com>
>


-- 
Dr Stuart Midgley
sdm900 at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20191002/54608422/attachment.html>


More information about the Beowulf mailing list