[Beowulf] copying big files
Erwan Velu
erwan at seanodes.com
Thu Aug 14 07:42:03 PDT 2008
Henning Fehrmann wrote:
> Hi everybody,
>
> Coping a big file onto all nodes in a cluster is a rather common problem.
> I would have thought that there might be a standard tool for
> distributing the files in an efficient way. So far, I haven't found one.
>
> Assuming one has a network design which allows non blocking full duplex
> wire-speed connections between N/2 pairs of nodes where N is the number
> of nodes in the cluster. It is basically a non blocking coreswitch.
>
> In this case the following scheme would be convenient and rather simple:
>
> The file is placed on node n1 and one builds a chain of nodes n1 , n2 .... nN.
>
> One splits the file into many packages (p1..pM), lets say a fragment fits
> into one TCP package. In the first step n1 transmits the package p1 to node n2.
> In the second step n1 transmits the package p2 to n2 and n2 transmits p1 to node n3.
>
> The transmission of a single package is fast. The time of passing a particular
> package through the whole chain of nodes is short compared with time of the
> entire copying process. E.g., using jumbo frames a package can have the size of ca 10kB.
> In Gb network the transmission time of a single package between nodes is
> of the order of 0.1 ms. Even in a cluster with 1024 nodes it takes
> in an ideal case just 0.1s to pass a package from node n1 through all nodes to n1024.
>
> On each node the package is stored and, in the end, one reassembles the file.
> For big files (size >> 10Mb) the required time is approximately
> the same as one needs for copying the file between two nodes plus 0.1s.
>
> One needs basically a daemon which handles copying requests and establishes
> the connection to next node in the chain.
>
> Has somebody written such a tool?
>
Sounds like you are looking for http://taktuk.gforge.inria.fr/
--
Erwan Velu
Pre-Sales Engineer
Seanodes
http://www.seanodes.com
+33 (0)1 41 22 13 83
More information about the Beowulf
mailing list