[Beowulf] Rsync - checksums
Bill Wichser
bill at princeton.edu
Tue Jun 18 08:00:16 PDT 2019
Well thanks for THAT pointer! Using --checksum-choice=none results in
speedup of somewhere between 2-3 times. That's my validation of the
checksum theory things have been pointing towards. Now to get xxhash
into rsync and I think we are all set.
Thanks,
Bill
On 6/18/19 9:57 AM, Ellis H. Wilson III wrote:
> On 6/18/19 9:16 AM, Bill Wichser wrote:
>> Stock RH 7 version, rsync-3.1.2-6.el7_6.1.x86_64. We've tried a
>> number of recompiles. gcc, Intel. The only thing between identical
>> compiles was the md4 vs md5.
>>
>> /bin/rsync -lptgoDAH -v --numeric-ids -d --relative --delete
>> --delete-after --files-from=...
>>
>> I'm not asking for help. Just if anyone had attempted to change the
>> algorithm into something much faster.
>>
>> I refer you to this project https://cyan4973.github.io/xxHash/ where
>> there is a table of speeds. Regardless of what anyone might
>> speculate, we are pursuing this route of changing out the algorithm.
>> Maybe it's all for naught. Maybe it isn't. But in a few weeks
>> hopefully we'll have determined.
>
> Very interesting. From the rsync man page:
>
> "Note that rsync always verifies that each transferred file was
> correctly reconstructed on the receiving side by checking a
> whole-file checksum that is generated as the file is transferred, but
> that automatic after-the-transfer verification has nothing to do with
> this option’s before-the-transfer "Does this file need to be updated?"
> check."
>
> So it sounds like you have sufficient churn in large files that the
> checksum validation post-transfer is your bottleneck. Short of hacking
> rsync to use a faster algorithm, your remaining choice is to use the
> --checksum-choice=STR and set it to none, and then perform your own
> hashing out-of-band to check the transferred data using the list you
> have provided via in files-from. This will nerf rsync's ability to do
> delta-transfer, which may be ok depending on the nature of your churning
> files. If your pipes are huge (atypical for DR), your CPU is weak, and
> your churning data is mostly completely new or completely changed files,
> --checksum-choice=none may work very well for you.
>
> Best,
>
> ellis
>
More information about the Beowulf
mailing list