[Beowulf] Rsync - checksums
Bill Wichser
bill at princeton.edu
Tue Jun 18 08:05:03 PDT 2019
No. Using the rsync daemon on the receiving end.
Bill
On 6/18/19 11:03 AM, Stu Midgley wrote:
> Are you rsyncing over ssh? If so, get HPN-SSH and use the non-cipher.
> MUCH faster again :)
>
> On Tue, Jun 18, 2019 at 11:00 PM Bill Wichser <bill at princeton.edu
> <mailto:bill at princeton.edu>> wrote:
>
> Well thanks for THAT pointer! Using --checksum-choice=none results in
> speedup of somewhere between 2-3 times. That's my validation of the
> checksum theory things have been pointing towards. Now to get xxhash
> into rsync and I think we are all set.
>
> Thanks,
> Bill
>
> On 6/18/19 9:57 AM, Ellis H. Wilson III wrote:
> > On 6/18/19 9:16 AM, Bill Wichser wrote:
> >> Stock RH 7 version, rsync-3.1.2-6.el7_6.1.x86_64. We've tried a
> >> number of recompiles. gcc, Intel. The only thing between
> identical
> >> compiles was the md4 vs md5.
> >>
> >> /bin/rsync -lptgoDAH -v --numeric-ids -d --relative --delete
> >> --delete-after --files-from=...
> >>
> >> I'm not asking for help. Just if anyone had attempted to change
> the
> >> algorithm into something much faster.
> >>
> >> I refer you to this project https://cyan4973.github.io/xxHash/
> where
> >> there is a table of speeds. Regardless of what anyone might
> >> speculate, we are pursuing this route of changing out the
> algorithm.
> >> Maybe it's all for naught. Maybe it isn't. But in a few weeks
> >> hopefully we'll have determined.
> >
> > Very interesting. From the rsync man page:
> >
> > "Note that rsync always verifies that each transferred file was
> > correctly reconstructed on the receiving side by checking a
> > whole-file checksum that is generated as the file is transferred,
> but
> > that automatic after-the-transfer verification has nothing to do
> with
> > this option’s before-the-transfer "Does this file need to be
> updated?"
> > check."
> >
> > So it sounds like you have sufficient churn in large files that the
> > checksum validation post-transfer is your bottleneck. Short of
> hacking
> > rsync to use a faster algorithm, your remaining choice is to use the
> > --checksum-choice=STR and set it to none, and then perform your own
> > hashing out-of-band to check the transferred data using the list you
> > have provided via in files-from. This will nerf rsync's ability
> to do
> > delta-transfer, which may be ok depending on the nature of your
> churning
> > files. If your pipes are huge (atypical for DR), your CPU is
> weak, and
> > your churning data is mostly completely new or completely changed
> files,
> > --checksum-choice=none may work very well for you.
> >
> > Best,
> >
> > ellis
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
> --
> Dr Stuart Midgley
> sdm900 at gmail.com <mailto:sdm900 at gmail.com>
More information about the Beowulf
mailing list