[Beowulf] Rsync - checksums

pellman.john at gmail.com pellman.john at gmail.com
Mon Jun 17 10:35:01 PDT 2019


I know that at one point, some Intel chips had instruction extensions
available to speed up SHA checksums by computing them directly in
hardware.  Might be worth looking into:
https://software.intel.com/en-us/articles/intel-sha-extensions

More recently, Intel has been promoting QuickAssist/QAT, which also seems
to perform hardware acceleration for SHA algorithms (seems like a possible
re-branding / architecture recycling).  There's some integration with ZFS
for this.
https://drive.google.com/file/d/0B_J4mRfoVJQRV3ZOd1ZMWkphcV9OYXdWT0FBblVHbVZpSmZj/view

SPARC has also had a large number of cipher algorithms hardwired into its
architecture recently (what Oracle is calling "Software in Silicon").  See
here <http://storageconference.us/2017/Presentations/Phillips.pdf>.  Of
course, to take advantage of this technology you'd have to deal with
Oracle, as well as an increasingly uncommon CPU architecture.


On Mon, Jun 17, 2019 at 12:34 PM Loncaric, Josip via Beowulf <
beowulf at beowulf.org> wrote:

> Why not use existing pftool?
>
> https://github.com/pftool/pftool
>
> -Josip
>
> On 6/17/19 10:07 AM, Michael Di Domenico wrote:
> > just out of morbid curiosity i popped through the rsync code.  it
> > doesn't look terribly difficult to wedge in a new algo.  but honestly,
> > if i was going to go through the trouble i'd write a new tool that
> > walks the file tree in parallel and logs the checksums to a database.
> > i've had problems rsync'ing big filesystems in the past, so i try to
> > avoid it as a DR or poor-man's snapshotting
> >
> > On Mon, Jun 17, 2019 at 11:30 AM Christopher Samuel <chris at csamuel.org>
> wrote:
> >> On 6/17/19 6:43 AM, Bill Wichser wrote:
> >>
> >>> md5 checksums take a lot of compute time with huge files and even with
> >>> millions of smaller ones.  The bulk of the time for running rsync is
> >>> spent in computing the source and destination checksums and we'd like
> to
> >>> alleviate that pain of a cryptographic algorithm.
> >> First of all I would note that rsync only uses checksums if you tell it
> >> to, otherwise it just uses file times and sizes to determine what to
> >> transfer.
> >>
> >> rsync is also single-threaded, so I would take a look at what was
> >> previously called parsync, but is now parsynfp :-)
> >>
> >> http://moo.nac.uci.edu/~hjm/parsync/
> >>
> >> There is the caveat there though:
> >>
> >> # As a warning, the main use case for parsyncfp is really only
> >> # very large data transfers thru fairly fast network connections
> >> # (>1Gb). Below this speed, rsync itself can saturate the
> >> # connection, so there’s little reason to use parsyncfp and in
> >> # fact the overhead of testing the existence of and starting more
> >> # rsyncs tends to worsen its performance on small transfers to
> >> # slightly less than rsync alone.
> >>
> >> Good luck!
> >> Chris
> >> --
> >>     Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
> --
> Dr. Josip Loncaric, LANL, MS-T001, P.O. Box 1663, Los Alamos, NM 87545
> mailto:josip at lanl.gov   Cell: +1-505-412-8490   Phone: +1-505-412-6538
> --
> E Pluribus Unum
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190617/4b4f7b42/attachment-0001.html>


More information about the Beowulf mailing list