[Beowulf] Rsync - checksums
bill at princeton.edu
Mon Jun 17 06:43:51 PDT 2019
We have moved to a rsync disk backup system, from TSM tape, in order to
have a DR for our 10 PB GPFS filesystem. We looked at a lot of options
but here we are.
md5 checksums take a lot of compute time with huge files and even with
millions of smaller ones. The bulk of the time for running rsync is
spent in computing the source and destination checksums and we'd like to
alleviate that pain of a cryptographic algorithm.
Googling around, I found no mention of using a technique like this to
improve rsync performance. I did find reference to a few hashing
algorithms though which could certainly work here (xxhash, murmurhash,
Rsync has certainly been around for a few years! We are going to pursue
changing the current checksum algorithm and using something much faster.
If anyone has done this already and would like to share their
experiences that would be wonderful. Ideally this could be some optional
plugin for rsync where users could choose which checksummer to use.
More information about the Beowulf