[Beowulf] Hadoop

Gerry Creager gerry.creager at tamu.edu
Fri Jan 2 04:10:10 PST 2009


Greg Lindahl wrote:
> On Fri, Dec 26, 2008 at 05:16:04PM -0600, Gerry Creager wrote:
> 
>> We've a user who has requested its installation on one of our clusters,  
>> a high-throughput system.
> 
> You didn't say anything about what they wanted to do. Hadoop is
> designed to store a lot of data, and then enable what we HPC people
> would call nearly-embarrassingly-parallel computation with good
> locality -- it takes shards of mapreduce computation to run on the
> same system as the disk shards being processed.

Ah, but there's the problem.  We've divined what they intend... we 
think... but they didn't originally tell us.

The PI involved is a relatively new, but reasonably experienced CS prof 
associated with our bioinformatics crowd.  She and her students intend 
to sift through plant genomic data for patterns (we think, based on her 
known affiliations).  *I* suspect she's interested, as well, because she 
read about Hadoop and wants to play.

> This means you'll have to dedicate systems over the long term to store
> the data (much like PVFS), and all of these systems will have to be a
> part of their mapreduce jobs. So if your queue system can run
> whole-cluster jobs easily, no problem.

Can it? Yes.  Is that the intent of the cluster? No.  The cluster is 
configured as a high-throughput system with a gigabit non-blocking 
backplane.  8 cores/node, all jobs are scheduled on a per-node basis. 
Each node DOES have local disk (this isn't an opportunity to reopen THAT 
religious war) so we theoretically could use the Hadoop file system, 
save it'd likely break our cluster design.  Instead, we're looking at 
Hadoop On Demand (http://hadoop.apache.org/core/docs/r0.17.2/hod.html).

> If, instead, they're just looking for a simple way to do
> embarrassingly parallel computations, without lots of persistent data,
> then you can probably point them at something easier and more friendly
> to your queue system.

Yeah, and I've been trying, but someone else promised them it'd be made 
available without talking to the guys who have to install and support 
it, because it "looks" like valuable computer science.

gerry
-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843



More information about the Beowulf mailing list