[Beowulf] 10G and rsync

Chris Dagdigian dag at sonsorol.org
Thu Jan 2 07:35:10 PST 2020

A few times a year I need to shift a few petabytes over the wire for a 
client and based on last year's project some thoughts ...

- I noticed you did not test small file / metadata operations. My past 
experience has found that this was the #1 cause of slowness in rsync and 
other file transfers. iperf and IOR tests are all well and good but you 
should run something like MDTEST to hammer the system on metadata and 
small file handling. If you are moving lots of tiny files or hundreds of 
thousands of directories etc this could be your problem

Lets put it this way: My $2500 QNap NAS box in my basement outperforms a 
$1.5 million dollar NAS from a vendor who I can't name here because 
their employees are "sensitive" to criticism when it comes to shifting 
small files. heh.

- I have seen some small but measurable improvement when compiling the 
latest rsync from source on my data mover nodes

- Single stream over 10gig has never been great for me doing big data 
movement. I get way more throughput by using rsync in parallel to 
migrate multi-stream either from a single 10gig connected host or a 
cluster of them

- The best tool for petascale data movement using rsync or parallel 
armies of rsync is "fpsync" from 
https://github.com/martymac/fpart/blob/master/tools/fpsync -- that 
package includes a filesystem crawler that can spit out balanced file 
lists that can be fed to an army of copy agents of your choosing (I use 
rsync).    If you have a ton of data to move and are getting nailed by 
single steam throughput it may make sense to try fpsync and run a few 
transfer rsync sessions in parallel

My $.02!


Michael Di Domenico wrote on 1/2/20 10:26 AM:
> does anyone know or has anyone gotten rsync to push wire speed
> transfers of big files over 10G links?  i'm trying to sync a directory
> with several large files.  the data is coming from local disk to a
> lustre filesystem.  i'm not using ssh in this case.  i have 10G
> ethernet between both machines.   both end points have more then
> enough spindles to handle 900MB/sec.
> i'm using 'rsync -rav --progress --stats -x --inplace
> --compress-level=0 /dir1/ /dir2/' but each file (which is 100's of
> GB's) is getting choked at 100MB/sec
> running iperf and dd between the client and the lustre hits 900MB/sec,
> so i fully believe this is an rsync limitation.
> googling around hasn't lent any solid advice, most of the articles are
> people that don't check the network first...
> with the prevalence of 10G these days, i'm surprised this hasn't come
> up before, or my google-fu really stinks.  which doesn't bode well
> given its the first work day of 2020 :(

More information about the Beowulf mailing list