[Beowulf] fast file copying
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Felix Rauch Valenti felix.rauch.valenti at gmail.comSun May 6 20:20:38 PDT 2007
- Previous message: [Beowulf] fast file copying
- Next message: [Beowulf] fast file copying
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 04/05/07, Bill Broadley <bill at cse.ucdavis.edu> wrote: > Geoff Galitz wrote: > > During an HPC talk some years ago, I recall someone mentioned a tool > > which can copy large datasets across a cluster using a ring topology. > > Perhaps someone here knows of this tool? > > Not sure about a ring topology, seems kinda silly... Why would that be silly? To clarify: The transmission through the ring happens in parallel, i.e., while a node n receives the data stream from node n-1, it writes the stream to disk and at the same time forwards it to node n+1. I have yet to see a tool that can achieve better data rates in practice, for reliable, high speed and large scale data distribution in clusters. > > More to the point, we are pushing around datasets that are about > > 1Gbyte. The datasets are pushed out to dozens of nodes all at once and > > How often? I just bit-torrented a 1GB file to 165 nodes in 3 minutes, > 1.5 minutes was the lazy why I launched it (the last node didn't > start until 1.5 minutes into the run). BTW, 140 or so of those nodes > already had 1 job per CPU running. 1 GB file in 1.5 minutes translates to about 11 MB/s, which sounds a lot like Fast Ethernet (100 mbps). By today's standards that's relatively slow and it's quite likely that the network will be the bottleneck for almost any tool. > There are various ways to maximize I/O with bit-torrent. Various > seeders allow uploading each block only once (usually called super > seeder mode). Assuming you have a few GB ram on the file server > you could even prefetch the file before torrenting (i.e. dd if=file_to_server > of=/dev/null) since the limit on bit-torrent bandwidth is often how > quickly you can seek. > > Additionally you can make the chunk size larger to reduce the number > of seeks. On the client side preallocation can greatly reduce > the number of seeks. More advantages of the ring topology: It uploads every block on every node exactly once, no prefetching and no seeks are required (if you replicate a whole partition or a single large file). If you are interested in more details about the technology, like models and performance measurements (somewhat old by now), check out the second paper in this list: http://www.cs.inf.ethz.ch/cops/patagonia/#relmat - Felix
- Previous message: [Beowulf] fast file copying
- Next message: [Beowulf] fast file copying
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
