[Beowulf] hadoop
Eugen Leitl
eugen at leitl.org
Tue Nov 27 08:34:12 PST 2012
On Tue, Nov 27, 2012 at 11:13:25AM -0500, Ellis H. Wilson III wrote:
> Are these problems EP such that they could be entirely Map tasks?
Not at all. This particular application is to derive optimal
feature extraction algorithms from high-resolution volumetric data
(mammal or primate connectome). At ~8 nm, even a mouse will
produce a mountain of structural data.
> Because otherwise you are going to have a fairly significant shuffle
> stage in your MapReduce application that will lead to overheads moving
> the data over the network and in and out of memory/disk/etc. Shuffling
> can be a real PITA, but it tends to be present in most real-world
> applications I've run into.
The extracted feature set would be much more compact than the
raw dataset (at least 10^3 to 10^6 more compact), and could
be loaded over the GBit/s network into the main cluster with
no problems.
> Maybe you weren't referring to using Hadoop, in which case this
> basically looks just like the FAWN project I had mentioned in the past
> that came out of CMU (with the addition of tiered storage).
http://www.cs.cmu.edu/~fawnproj/ ?
Cute, and probably the right application for the
Adapteva project. If the boards are credit-card
sized you can mount them on a rackmount tray
along with a 24-port switch, with a couple of
fans.
However, I'm thinking about a board you directly plug
your SATA or SAS hard drive into, probably using
the hard drive itself (which should be 5k rpm then)
as a heatsink.
More information about the Beowulf
mailing list