[Beowulf] fast file copying

Geoff Galitz geoff at galitz.org
Thu May 10 15:06:31 PDT 2007

Thanks to all for responding...  here is a follow up:

We push our datasets out as part of a service deployment routine  
which includes a bunch of "other stuff" in addition to just getting  
the data to the nodes.  I went ahead and modified our service  
deployment program to use dolly.  Here is what we do:

- enter deployment phase
  - check for member nodes that are alive
  - dynamically build the config file
  - bring the ring up
  - start the transfer
  - finish
  - tear everything down
- enter next phase

With this system, we can support a dynamic environment where nodes go  
on and offline at (our) will.

We use pdsh to do as much of the configuration and command execution  
as possible.  This made dolly a better choice for us rather than  
nettee as we can issue the exact same command to all nodes in  
parallel.  Nettee required more specific commands on each node.

In our testing environment, we're getting as much as 45MB/sec and as  
little as 11MB/sec in our various scenarios (mismatched hardware,  
busy network, different types of data).  We did achieve our primary  
goal in reducing load on the master/server system.  In our old setup,  
our load would increase to 25+.  With dolly, our load never exceeds 1.5.

I plan on also making the same test with torrent.


> The normal ring based disadvantages, reliability and performance.

> Seems like any number of things could make the ring based approach
> a poor choice, where the worst case of the ring could dramatically
> slow things down.  Things like:
> * Head node's network connection is 10 times faster
> * A single node dies during the transfer
> * A single node joins late
> * A single node is very busy (I/O, memory constrained, or CPU)

more snipped

More information about the Beowulf mailing list