[Beowulf] Rsync - checksums

Ellis H. Wilson III ellis at ellisv3.com
Tue Jun 18 06:57:34 PDT 2019

On 6/18/19 9:16 AM, Bill Wichser wrote:
> Stock RH 7 version, rsync-3.1.2-6.el7_6.1.x86_64.  We've tried a number 
> of recompiles.  gcc, Intel.  The only thing between identical compiles 
> was the md4 vs md5.
> /bin/rsync -lptgoDAH -v --numeric-ids -d --relative --delete 
> --delete-after --files-from=...
> I'm not asking for help.  Just if anyone had attempted to change the 
> algorithm into something much faster.
> I refer you to this project https://cyan4973.github.io/xxHash/ where 
> there is a table of speeds.  Regardless of what anyone might speculate, 
> we are pursuing this route of changing out the algorithm.  Maybe it's 
> all for naught.  Maybe it isn't.  But in a few weeks hopefully we'll 
> have determined.

Very interesting.  From the rsync man page:

"Note that rsync always verifies that each transferred file was 
correctly reconstructed  on  the  receiving  side  by checking  a 
whole-file checksum that is generated as the file is transferred, but 
that automatic after-the-transfer verification has nothing to do with 
this option’s before-the-transfer "Does this file need to be updated?" 

So it sounds like you have sufficient churn in large files that the 
checksum validation post-transfer is your bottleneck.  Short of hacking 
rsync to use a faster algorithm, your remaining choice is to use the 
--checksum-choice=STR and set it to none, and then perform your own 
hashing out-of-band to check the transferred data using the list you 
have provided via in files-from.  This will nerf rsync's ability to do 
delta-transfer, which may be ok depending on the nature of your churning 
files.  If your pipes are huge (atypical for DR), your CPU is weak, and 
your churning data is mostly completely new or completely changed files, 
--checksum-choice=none may work very well for you.



Ellis H. Wilson III, Ph.D.

More information about the Beowulf mailing list