[Beowulf] fast file copying

Sun May 6 20:20:38 PDT 2007

On 04/05/07, Bill Broadley <bill at cse.ucdavis.edu> wrote:
> Geoff Galitz wrote:
> > During an HPC talk some years ago, I recall someone mentioned a tool
> > which can copy large datasets across a cluster using a ring topology.
> > Perhaps someone here knows of this tool?
>
> Not sure about a ring topology, seems kinda silly...

Why would that be silly? To clarify: The transmission through the ring
happens in parallel, i.e., while a node n receives the data stream
from node n-1, it writes the stream to disk and at the same time
forwards it to node n+1.

I have yet to see a tool that can achieve better data rates in
practice, for reliable, high speed and large scale data distribution
in clusters.

> > More to the point, we are pushing around datasets that are about
> > 1Gbyte.  The datasets are pushed out to dozens of nodes all at once and
>
> How often?  I just bit-torrented a 1GB file to 165 nodes in 3 minutes,
> 1.5 minutes was the lazy why I launched it (the last node didn't
> start until 1.5 minutes into the run).  BTW, 140 or so of those nodes
> already had 1 job per CPU running.

1 GB file in 1.5 minutes translates to about 11 MB/s, which sounds a
lot like Fast Ethernet (100 mbps). By today's standards that's
relatively slow and it's quite likely that the network will be the
bottleneck for almost any tool.

> There are various ways to maximize I/O with bit-torrent.  Various
> seeders allow uploading each block only once (usually called super
> seeder mode).  Assuming you have a few GB ram on the file server
> you could even prefetch the file before torrenting (i.e. dd if=file_to_server
> of=/dev/null) since the limit on bit-torrent bandwidth is often how
> quickly you can seek.
>
> Additionally you can make the chunk size larger to reduce the number
> of seeks.  On the client side preallocation can greatly reduce
> the number of seeks.

More advantages of the ring topology: It uploads every block on every
node exactly once, no prefetching and no seeks are required (if you
replicate a whole partition or a single large file).

If you are interested in more details about the technology, like
models and performance measurements (somewhat old by now), check out
the second paper in this list:

http://www.cs.inf.ethz.ch/cops/patagonia/#relmat

- Felix