[Beowulf] Rsync - checksums
Christopher Samuel
chris at csamuel.org
Mon Jun 17 08:29:53 PDT 2019
On 6/17/19 6:43 AM, Bill Wichser wrote:
> md5 checksums take a lot of compute time with huge files and even with
> millions of smaller ones. The bulk of the time for running rsync is
> spent in computing the source and destination checksums and we'd like to
> alleviate that pain of a cryptographic algorithm.
First of all I would note that rsync only uses checksums if you tell it
to, otherwise it just uses file times and sizes to determine what to
transfer.
rsync is also single-threaded, so I would take a look at what was
previously called parsync, but is now parsynfp :-)
http://moo.nac.uci.edu/~hjm/parsync/
There is the caveat there though:
# As a warning, the main use case for parsyncfp is really only
# very large data transfers thru fairly fast network connections
# (>1Gb). Below this speed, rsync itself can saturate the
# connection, so there’s little reason to use parsyncfp and in
# fact the overhead of testing the existence of and starting more
# rsyncs tends to worsen its performance on small transfers to
# slightly less than rsync alone.
Good luck!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the Beowulf
mailing list