[Beowulf] copying big files

Erwan Velu erwan at seanodes.com
Thu Aug 14 07:42:03 PDT 2008

Henning Fehrmann wrote:
> Hi everybody,
> Coping a big file onto all nodes in a cluster is a rather common problem.
> I would have thought that there might be a standard tool for 
> distributing the files in an efficient way. So far, I haven't found one.
> Assuming one has a network design which allows non blocking full duplex
> wire-speed connections between N/2 pairs of nodes where N is the number
> of nodes in the cluster. It is basically a non blocking coreswitch. 
> In this case the following scheme would be convenient and rather simple:
> The file is placed on node n1 and one builds a chain of nodes n1 , n2 .... nN.
> One splits the file into many packages (p1..pM), lets say a fragment fits
> into one TCP package. In the first step n1 transmits the package p1 to node n2.
> In the second step n1 transmits the package p2 to n2 and n2 transmits p1 to node n3.
> The transmission of a single package is fast. The time of passing a particular
> package through the whole chain of nodes is short compared with time of the 
> entire copying process. E.g., using jumbo frames a package can have the size of ca 10kB.
> In Gb network the transmission time of a single package between nodes is 
> of the order of 0.1 ms.  Even in a cluster with 1024 nodes it takes
> in an ideal case just 0.1s to pass a package from node n1 through all nodes to n1024.
> On each node the package is stored and, in the end, one reassembles the file.
> For big files (size >> 10Mb) the required time is approximately 
> the same as one needs for copying the file between two nodes plus 0.1s.
> One needs basically a daemon which handles copying requests and establishes 
> the connection to next node in the chain.
> Has somebody written such a tool?
Sounds like you are looking for http://taktuk.gforge.inria.fr/

Erwan Velu
Pre-Sales Engineer
+33 (0)1 41 22 13 83

More information about the Beowulf mailing list