[Beowulf] copying big files (Henning Fehrmann)

David Mathog mathog at caltech.edu
Mon Aug 18 08:38:09 PDT 2008

Henning Fehrmann wrote:

> I spread successfully a 10G file to 50 nodes. The rate was 140Mb/s for
nettee and a bit slower using  dolly.
> I guess it was due to a busy node somewhere in the chain.  
> Increasing the number of clients up to 100 failed in both cases.
> For nettee I got:
> nettee: fatal error writing to child: Connection reset by peer

> I will do more systematic test the next days. 
> David Mathog, are you interested in bug reports?

Yes, please. 

If memory serves you will see that error whenever a child node, or
nettee on that child, crashes.  For instance, if you "kill -9" nettee on
a child the parent should see that.  The command option -colwf will let
the chain continue if this is caused by a full disk or a stdout pipe
failing.  The option -conwf should let the chain continue transfer down
to one above the failed node, and it should tell you which node it was
that failed, so long as -v is used with the appropriate bits.


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

More information about the Beowulf mailing list