[Beowulf] PetaBytes on a budget, take 2

Greg Lindahl lindahl at pbm.com
Thu Jul 21 22:04:36 PDT 2011


On Fri, Jul 22, 2011 at 12:33:37AM -0400, Mark Hahn wrote:

> storage isn't about performance any more.  ok, hyperbole, a little.
> but even a cheap disk does > 100 MB/s, and in all honesty, there are
> not tons of people looking for bandwidth more than a small multiplier
> of that.  sure, a QDR fileserver wants more than a couple disks,
> and if you're an iops-head, you're going flash anyway.

Over in the big data world, we're all about disk bandwidth, because we
take the computation to the data. When we're reading something for a
Map/Reduce job, we can easily drive 800 MB/s off of 8 disks in a
single node, and for many jobs the most expensive thing about the job
is reading. Good thing we have 3 copies of every bit of data, that
gives us 1/3 the runtime.

Writing, not so happy. Network bandwidth is a lot more expensive than
disk bandwidth.

Some data manipulations in HPC are like Map/Reduce. For example,
shooting a movie using saved state files is embarrassingly parallel.

The first system I heard about which took computation to the data was
from SDSC, long before GOOG was founded.

-- greg






More information about the Beowulf mailing list