[Beowulf] copying data between clusters

Hearns, John john.hearns at mclaren.com
Fri Mar 5 10:05:38 PST 2010

> I'd like to paralyze that across multiple nodes to drive the aggregate
> up
> I was hoping someone would pop up say, hey use this magical piece of
> software. (of which im unable to locate)..
My recommendation also would be to use an external storage device - a
USB drive would be useful, and I have been involved in a couple of
industrial projects where data has been brought to a cluster on an
external USB drive. It is as people say quite an efficient way to
transfer the data.

I gather that for high def digital cinema a RAID array is physically
shipped to the cinema - I guess that also helps with data security, as
you could do some sort of encryption on the drives, though I might be
In the digital media world, there are some fast parallel SCP boxes which
are an industry standard - I gather they cost $$$$ but do make transfers
I forget the name, and if they don't really do parallel SCP forgive me -
its something along those lines.

Re. moving data to/from a cluster over a WAN link, I did look at this
You can set up a fuse filesystem running over SSH. This actually works
quite well from the point of view of ease of setting up and usability,
but I didn't try any serious data transfer over it - and of course it
cannot be faster than ssh anyway!

I did also have a look at the types of tools used by grids for bulk data
transfer, but not much more than looking.
Here's an interesting link I found:  http://fasterdata.es.net/tools.html

ps. you don't say how you are transferring the data - if via rsync you
have looked at the encryption options you are using?

John Hearns

