[Beowulf] Rsync - checksums

John Hearns hearnsj at googlemail.com
Mon Jun 17 08:04:33 PDT 2019


Probably best asking this question over on the GPFS mailing list.

A bit of Googling reminded me of   https://www.arcastream.com/ They are
active in the UK Academic community,
not sure about your neck of the woods.
Give them a shout though and ask for Steve Mackie.
http://arcastream.com/what-we-do/

On Mon, 17 Jun 2019 at 15:39, Michael Di Domenico <mdidomenico4 at gmail.com>
wrote:

> rsync on 10PB sounds painful.  i haven't used GPFS in a very long
> time, so i might have a gap in knowledge.  but i would be surprised if
> GPFS doesn't have a changelog, where you can watch the files that
> changed through the day and only copy the ones that did?  much like
> what robinhood does for lustre.
>
> On Mon, Jun 17, 2019 at 9:44 AM Bill Wichser <bill at princeton.edu> wrote:
> >
> > We have moved to a rsync disk backup system, from TSM tape, in order to
> > have a DR for our 10 PB GPFS filesystem.  We looked at a lot of options
> > but here we are.
> >
> > md5 checksums take a lot of compute time with huge files and even with
> > millions of smaller ones.  The bulk of the time for running rsync is
> > spent in computing the source and destination checksums and we'd like to
> > alleviate that pain of a cryptographic algorithm.
> >
> > Googling around, I found no mention of using a technique like this to
> > improve rsync performance.  I did find reference to a few hashing
> > algorithms though which could certainly work here (xxhash, murmurhash,
> > sbox, cityhash64).
> >
> > Rsync has certainly been around for a few years!  We are going to pursue
> > changing the current checksum algorithm and using something much faster.
> >   If anyone has done this already and would like to share their
> > experiences that would be wonderful. Ideally this could be some optional
> > plugin for rsync where users could choose which checksummer to use.
> >
> > Bill
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190617/cc441744/attachment.html>


More information about the Beowulf mailing list