[Beowulf] PetaBytes on a budget, take 2
Greg Lindahl
lindahl at pbm.com
Thu Jul 21 22:04:36 PDT 2011
On Fri, Jul 22, 2011 at 12:33:37AM -0400, Mark Hahn wrote:
> storage isn't about performance any more. ok, hyperbole, a little.
> but even a cheap disk does > 100 MB/s, and in all honesty, there are
> not tons of people looking for bandwidth more than a small multiplier
> of that. sure, a QDR fileserver wants more than a couple disks,
> and if you're an iops-head, you're going flash anyway.
Over in the big data world, we're all about disk bandwidth, because we
take the computation to the data. When we're reading something for a
Map/Reduce job, we can easily drive 800 MB/s off of 8 disks in a
single node, and for many jobs the most expensive thing about the job
is reading. Good thing we have 3 copies of every bit of data, that
gives us 1/3 the runtime.
Writing, not so happy. Network bandwidth is a lot more expensive than
disk bandwidth.
Some data manipulations in HPC are like Map/Reduce. For example,
shooting a movie using saved state files is embarrassingly parallel.
The first system I heard about which took computation to the data was
from SDSC, long before GOOG was founded.
-- greg
More information about the Beowulf
mailing list