[Beowulf] 10G and rsync
Bill Abbott
babbott at rutgers.edu
Thu Jan 2 07:52:18 PST 2020
Fpsync and parsyncfp both do a great job with multiple rsyncs, although
you have to be careful about --delete. The best performance for fewer,
larger files, if it's an initial or one-time transfer, is bbcp with
multiple streams.
Also jack up the tcp send buffer and turn on jumbo frames.
Bill
On 1/2/20 10:48 AM, Paul Edmon wrote:
> I also highly recommend fpsync. Here is a rudimentary guide to this:
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rc.fas.harvard.edu%2Fresources%2Fdocumentation%2Ftransferring-data-on-the-cluster%2F&data=02%7C01%7Cbabbott%40rutgers.edu%7C473e235971f341a2ceb808d78f9b7958%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135770247733614&sdata=us%2FmQefb44G%2BaCVVZRoJ797uI3TIgrnmR%2FU0WXsmskE%3D&reserved=0
>
>
> I can get line speed with fpsync but single rsyncs usually only get up
> to about 0.3-1 GB/s. You really want that parallelism. We use fpsync
> for all our large scale data movement here and Globus for external
> transfers.
>
> -Paul Edmon-
>
> On 1/2/20 10:45 AM, Joe Landman wrote:
>>
>> On 1/2/20 10:26 AM, Michael Di Domenico wrote:
>>> does anyone know or has anyone gotten rsync to push wire speed
>>> transfers of big files over 10G links? i'm trying to sync a directory
>>> with several large files. the data is coming from local disk to a
>>> lustre filesystem. i'm not using ssh in this case. i have 10G
>>> ethernet between both machines. both end points have more then
>>> enough spindles to handle 900MB/sec.
>>>
>>> i'm using 'rsync -rav --progress --stats -x --inplace
>>> --compress-level=0 /dir1/ /dir2/' but each file (which is 100's of
>>> GB's) is getting choked at 100MB/sec
>>
>> A few thoughts
>>
>> 1) are you sure your traffic is traversing the high bandwidth link?
>> Always good to check ...
>>
>> 2) how many files are you xfering? Are these generally large files or
>> many small files, or a distribution with a long tail towards small
>> files? The latter two will hit your metadata system fairly hard, and
>> in the case of Lustre, performance will depend critically upon the
>> MDS/MDT architecture and implementation. FWIW, the big system I was
>> working on setting up late last year, we hit MIOP level reads/writes,
>> but then again, this was architected correctly.
>>
>> 3) wire speed xfers are generally the exception unless you are doing
>> large sequential single files. There are tricks you can do to enable
>> this, but they are often complex. You can use the array of
>> writers/readers, and leverage parallelism, but you risk invoking
>> congestion/pause throttling on your switch.
>>
>>
>>>
>>> running iperf and dd between the client and the lustre hits 900MB/sec,
>>> so i fully believe this is an rsync limitation.
>>>
>>> googling around hasn't lent any solid advice, most of the articles are
>>> people that don't check the network first...
>>>
>>> with the prevalence of 10G these days, i'm surprised this hasn't come
>>> up before, or my google-fu really stinks. which doesn't bode well
>>> given its the first work day of 2020 :(
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeowulf.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cbabbott%40rutgers.edu%7C473e235971f341a2ceb808d78f9b7958%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135770247733614&sdata=yEYsxZWvLxkPpQPpqDer%2FXwVmkPcpLiK%2FQmzOwKrzCI%3D&reserved=0
>>>
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeowulf.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fbeowulf&data=02%7C01%7Cbabbott%40rutgers.edu%7C473e235971f341a2ceb808d78f9b7958%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C1%7C637135770247733614&sdata=yEYsxZWvLxkPpQPpqDer%2FXwVmkPcpLiK%2FQmzOwKrzCI%3D&reserved=0
>
More information about the Beowulf
mailing list