[Beowulf] Storage

Robert G. Brown rgb at phy.duke.edu
Thu Oct 7 17:27:13 PDT 2004

On Thu, 7 Oct 2004, Robert G. Brown wrote:

> In case the above wasn't clear, think:
> a) Run 1 day to 1 week, generate some 100+ GB per CPU on node local
> storage;

I hate to reply to myself, but I meant "per node" -- on a hundred node
dual CPU cluster, generate as much as 2 TB of raw data a week, which
reduces to maybe 0.2 TB of data a week in hundreds of files.  Multiply
by 50 and we'll fill 10+ TB in a year, in tens of thousands of files (or
more).  And this is the lower-bound estimate, likely off by a factor of
2-4 and certain to be off by even more as the cluster scales up in size
over the next few years to as many as 500 nodes sustained, all cranking
out data according to this prescription but amplified by Moore's Law by
exponentially increasing factors.

This is why I'm worried about scaling so much.  Even the genomics people
have some sort of linear bounds on their data production rate.  This has
exponential growth in productivity matching (hopefully) expected growth
in storage, so it might not get relatively easier... and if the
exponents mismatch, it could get a lot worse.


Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list