[Beowulf] copying data between clusters

Joe Landman landman at scalableinformatics.com
Fri Mar 5 08:00:03 PST 2010


Michael Di Domenico wrote:
> How does one copy large (20TB) amounts of data from one cluster to another?
> 
> Assuming that each node in the cluster can only do about 30MB/sec
> between clusters and i want to preserve the uid/gid/timestamps, etc
> 
> I know how i do it, but i'm curious what methods other people use...

I am biased of course, but Fedex-net with one of these: 
http://scalableinformatics.com/jackrabbit

1GB @ 30 MB/s is about 33s.  1TB @ 30 MB/s is about 33000s.  Or more 
than 1/3 of a day.  20TB @ 30 MB/s ... you are looking at ~7 days to write.

If you have a 1GB/s disk write speed (less than the above unit can do), 
1TB takes ~1000s, 20TB takes 20000s, about 1/4 of a day.

If the clusters are close enough (same data center) this could be a 
shared storage but you will need a fast network between them.  If the 
clusters are far enough to avoid direct connection, chances are 30 MB/s 
may be optimistic on getting data between them.

BTW: 30 MB/s sounds suspiciously like either a) 1GbE sustained NFS speed 
for some nodes or b) the speed of an IDE drive.

> 
> Just a general survey...
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list