[Beowulf] Hadoop
Gerry Creager
gerry.creager at tamu.edu
Fri Jan 2 04:10:10 PST 2009
Greg Lindahl wrote:
> On Fri, Dec 26, 2008 at 05:16:04PM -0600, Gerry Creager wrote:
>
>> We've a user who has requested its installation on one of our clusters,
>> a high-throughput system.
>
> You didn't say anything about what they wanted to do. Hadoop is
> designed to store a lot of data, and then enable what we HPC people
> would call nearly-embarrassingly-parallel computation with good
> locality -- it takes shards of mapreduce computation to run on the
> same system as the disk shards being processed.
Ah, but there's the problem. We've divined what they intend... we
think... but they didn't originally tell us.
The PI involved is a relatively new, but reasonably experienced CS prof
associated with our bioinformatics crowd. She and her students intend
to sift through plant genomic data for patterns (we think, based on her
known affiliations). *I* suspect she's interested, as well, because she
read about Hadoop and wants to play.
> This means you'll have to dedicate systems over the long term to store
> the data (much like PVFS), and all of these systems will have to be a
> part of their mapreduce jobs. So if your queue system can run
> whole-cluster jobs easily, no problem.
Can it? Yes. Is that the intent of the cluster? No. The cluster is
configured as a high-throughput system with a gigabit non-blocking
backplane. 8 cores/node, all jobs are scheduled on a per-node basis.
Each node DOES have local disk (this isn't an opportunity to reopen THAT
religious war) so we theoretically could use the Hadoop file system,
save it'd likely break our cluster design. Instead, we're looking at
Hadoop On Demand (http://hadoop.apache.org/core/docs/r0.17.2/hod.html).
> If, instead, they're just looking for a simple way to do
> embarrassingly parallel computations, without lots of persistent data,
> then you can probably point them at something easier and more friendly
> to your queue system.
Yeah, and I've been trying, but someone else promised them it'd be made
available without talking to the guys who have to install and support
it, because it "looks" like valuable computer science.
gerry
--
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Beowulf
mailing list