[Beowulf] copying big files (Henning Fehrmann)
David Mathog
mathog at caltech.edu
Mon Aug 18 08:38:09 PDT 2008
Henning Fehrmann wrote:
>
> I spread successfully a 10G file to 50 nodes. The rate was 140Mb/s for
nettee and a bit slower using dolly.
> I guess it was due to a busy node somewhere in the chain.
> Increasing the number of clients up to 100 failed in both cases.
>
> For nettee I got:
> nettee: fatal error writing to child: Connection reset by peer
>
> I will do more systematic test the next days.
> David Mathog, are you interested in bug reports?
Yes, please.
If memory serves you will see that error whenever a child node, or
nettee on that child, crashes. For instance, if you "kill -9" nettee on
a child the parent should see that. The command option -colwf will let
the chain continue if this is caused by a full disk or a stdout pipe
failing. The option -conwf should let the chain continue transfer down
to one above the failed node, and it should tell you which node it was
that failed, so long as -v is used with the appropriate bits.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the Beowulf
mailing list