[Beowulf] fast file copying

Fri May 4 01:48:50 PDT 2007

Geoff Galitz wrote:
> 
> Hi folks,
> 
> During an HPC talk some years ago, I recall someone mentioned a tool 
> which can copy large datasets across a cluster using a ring topology.  
> Perhaps someone here knows of this tool?

Not sure about a ring topology, seems kinda silly... why not bit-torrent?
It's opensource, extremely common, and already integrated into at least
one cluster distribution.  There's a zillion implementation, your favorite
language is likely to have a few (at least for python, c, and java).

I've installed a 170+ node rocks cluster in 10 minutes or so, the RPMs
are distributed by bit-torrent so that it doesn't matter if one node
dies as part of the install.  Nor does it matter if your node list has
some strange mapping to your physical network (which is often what you
get when you ask a batch queue for 5% of a cluster).

> More to the point, we are pushing around datasets that are about 
> 1Gbyte.  The datasets are pushed out to dozens of nodes all at once and 

How often?  I just bit-torrented a 1GB file to 165 nodes in 3 minutes,
1.5 minutes was the lazy why I launched it (the last node didn't
start until 1.5 minutes into the run).  BTW, 140 or so of those nodes
already had 1 job per CPU running.

> we foresee saturating the I/O system on our cluster as we grow.  We are 
> limited to using just the available disks and are looking for a 
> reasonable solution that can support this kind of simultaneous access.   

There are various ways to maximize I/O with bit-torrent.  Various
seeders allow uploading each block only once (usually called super
seeder mode).  Assuming you have a few GB ram on the file server
you could even prefetch the file before torrenting (i.e. dd if=file_to_server
of=/dev/null) since the limit on bit-torrent bandwidth is often how
quickly you can seek.

Additionally you can make the chunk size larger to reduce the number
of seeks.  On the client side preallocation can greatly reduce
the number of seeks.

> Currently we push the data out using rsync, but if I don't get any 
> better ideas I may simply move to a pull system where the data is 
> fetched by HTTP.  I can get better throttling that way, at least.
> 

If you have a low churn rate you could generate a diff (with rsync)
and distribute that via bit-torrent.

What kind of per node bandwidths are you hoping for?  1GB sounds really
easy unless you have to do it rather often.