[Beowulf] Rsync - checksums
Ellis H. Wilson III
ellis at ellisv3.com
Tue Jun 18 06:57:34 PDT 2019
On 6/18/19 9:16 AM, Bill Wichser wrote:
> Stock RH 7 version, rsync-3.1.2-6.el7_6.1.x86_64. We've tried a number
> of recompiles. gcc, Intel. The only thing between identical compiles
> was the md4 vs md5.
> /bin/rsync -lptgoDAH -v --numeric-ids -d --relative --delete
> --delete-after --files-from=...
> I'm not asking for help. Just if anyone had attempted to change the
> algorithm into something much faster.
> I refer you to this project https://cyan4973.github.io/xxHash/ where
> there is a table of speeds. Regardless of what anyone might speculate,
> we are pursuing this route of changing out the algorithm. Maybe it's
> all for naught. Maybe it isn't. But in a few weeks hopefully we'll
> have determined.
Very interesting. From the rsync man page:
"Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking a
whole-file checksum that is generated as the file is transferred, but
that automatic after-the-transfer verification has nothing to do with
this option’s before-the-transfer "Does this file need to be updated?"
So it sounds like you have sufficient churn in large files that the
checksum validation post-transfer is your bottleneck. Short of hacking
rsync to use a faster algorithm, your remaining choice is to use the
--checksum-choice=STR and set it to none, and then perform your own
hashing out-of-band to check the transferred data using the list you
have provided via in files-from. This will nerf rsync's ability to do
delta-transfer, which may be ok depending on the nature of your churning
files. If your pipes are huge (atypical for DR), your CPU is weak, and
your churning data is mostly completely new or completely changed files,
--checksum-choice=none may work very well for you.
Ellis H. Wilson III, Ph.D.
More information about the Beowulf