[Beowulf] Daisychained rcp script

Mon Mar 21 23:04:12 PST 2005

On Mon, 21 Mar 2005 15:51:24 -0800, David Mathog
<mathog at mendel.bio.caltech.edu> wrote:
> Here's a script for copying a file across a list of nodes.
> 
> ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/pdist_file.sh
> 
> It uses a daisychain method similar to that in "dolly".  I'm
> a bit curious how it holds up on larger sites with different network
> hardware. We have a switched 100baseT network with data starting
> on the headnode and going to up to 20 nodes, all nodes are identical.
> Here are some timings with nothing else running:
> 
> Nodes  Time (s     Mb/s          Repeater Nodes
> 1      8           10.8          0
> 2      8.9 - 9.3   9.7 - 9.3     1
> 1-5    13.5-14.8   6.4 - 5.8     4
> 1-10   13.5-17.4   6.4 - 5.0     9
> 1-20   19.5-20.5   4.4 - 4.2    19

I only had a quick look at your script, but it seems that it uses
(named) pipes and "tee", so I'd guess it does more data copies than
"dolly" (which implements the whole replication in a single C
program). That could explain a difference in throughput between 1 node
and multiple nodes, because the repeater nodes limit the performance.

A reason for the farther decrease in performance with higher numbers
of nodes might be that the pipes and your network connection don't use
the same blocksize (I'm not sure though), which could result in
"hiccups" in the daisychain due to bad synchronisation between data
streams.

[...]
> I also times this using my variant of dolly 0.57C, which should
> be about the same as 0.58.  Interestingly even though dolly
> reports that it is moving
> 
> Time: 8.935656
> MBytes/s: 9.674
> 
> when I use "time" to measure the actual elapsed time the transfer
> actually takes 16.0 seconds total elapsed time, for 5.4 Mb/s.
> (And that doesn't count the 1 second or so for rsh to set up
> the 20 slave dolly processes.)  So dolly is a little better
> than my simple script but it also can't keep the network
> running flat out.

I didn't check dolly's code, but I guess it doesn't measure the
startup and teardown phases, because I was mostly interested in
throughput for very large files (that's what dolly was written for).

To get rid of the startup phase in dolly -- and thus achiever higher
throughputs for medium sized files -- one might want to use a dolly
daemon. Such a daemon would be started once, set up all the daisychain
connections, and then wait for files to transmit. Thus, the file
replication could start immediately after writing the file to the
dolly server daemon, withouth any setup or teardownup delays.

- Felix