[Beowulf] Storage
Robert G. Brown
rgb at phy.duke.edu
Thu Oct 7 17:27:13 PDT 2004
On Thu, 7 Oct 2004, Robert G. Brown wrote:
> In case the above wasn't clear, think:
>
> a) Run 1 day to 1 week, generate some 100+ GB per CPU on node local
> storage;
I hate to reply to myself, but I meant "per node" -- on a hundred node
dual CPU cluster, generate as much as 2 TB of raw data a week, which
reduces to maybe 0.2 TB of data a week in hundreds of files. Multiply
by 50 and we'll fill 10+ TB in a year, in tens of thousands of files (or
more). And this is the lower-bound estimate, likely off by a factor of
2-4 and certain to be off by even more as the cluster scales up in size
over the next few years to as many as 500 nodes sustained, all cranking
out data according to this prescription but amplified by Moore's Law by
exponentially increasing factors.
This is why I'm worried about scaling so much. Even the genomics people
have some sort of linear bounds on their data production rate. This has
exponential growth in productivity matching (hopefully) expected growth
in storage, so it might not get relatively easier... and if the
exponents mismatch, it could get a lot worse.
rgb
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list