[Beowulf] Re: copying data between clusters

Fri Mar 5 14:14:25 PST 2010

Michael Di Domenico wrote:

> lets see if i can clarify
> 
> assuming there are two clusters - clusterA and clusterB
> 
> Each cluster is 32nodes and has 50TB of storage attached

Attached how?  Is the 50TB sitting on one file server on each cluster,
or is it distributed across the cluster?  We need more details.

> 
> the aggregate network bandwidth between the clusters is 800MB/sec
> 
> the problem is the per-node bandwidth on clusterB is 30MB/sec

Is there a switch on each cluster so that each node can write directly
to the interconnect between clusters?   Specifically, can node A12 write
to node B12?  Sounds like there might be, and since you seem to care
about the per-node bandwidth on the target it sounds like you have a
situation where the data is distributed on A and will again be
distributed across nodes on B.  If that's what you mean, then you just
need to queue up a job on each node to do something like:

 (cd $DATADIRECTORY ; tar -cf - . ) \
   | ssh matching_target_node 'cd $DATADIRECTORY; tar -xf - )  

It will run in parallel using up all of your interconnect bandwidth.
If on the other hand, the only per node rate you care about is the one
fileserver on B, then it is a different problem.  On the other, other
hand, if you can temporarily store the data on each node of B, and the
cumulative bandwidth that way is 800MB/s you could conceivably transfer
it in parallel from A to all 32 destinations in B, and put the mess back
together in B later.  However, if you are still rate limited to 30Mb/sec
on a single B fileserver then the total time to complete this operation
will not change, only the time the data is in transit between the
clusters will be reduced.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech