[Beowulf] Hadoop

Sat Dec 27 07:59:54 PST 2008

Jeff,

I'm an old, guy and don't mind top-posts!

Thanks for the insight!
gerry

Jeff Layton wrote:
> Sorry for top-posting (I hate these on-line email tools...)
> 
> Did the person requesting Hadoop ever say why they wanted it? For 
> example, do they have code written in MapReduce or do they think that 
> Hadoop will give them faster throughput than something else?
> 
> Hadoop is a project that really has 2 parts to it - an open-source 
> MapReduce implementation, and a file system. From people I've talked to, 
> the MapReduce part is used far more than the file system. But I've 
> talked to some of the developers of the file system and there are some 
> people who use the file system.
> 
> In general the file system is basically a virtual file system ala' PVFS, 
> GlusterFS or any object based storage (Panasas, Lustre). However it 
> understand the idea of locality - that is where useful storage is in 
> relation to the compute part of the problem. The idea being that you can 
> reduce the time to transmit the data because the storage is closer. But, 
> in general, the improvement you get is due to the network topology, not 
> necessarily the file system itself. That's because, in general, 
> MapReduce systems have network topologies with bottlenecks all over the 
> place because they don't really need a full bi-sectional bandwidth 
> network everywhere. So for example they may have good bandwidth to a 
> switch within the rack, but outside the rack, they bandwidth is not so 
> hot. But again, these are generalizations, and the details are always in 
> the implementation.
> 
> HadoopFS (lack of a better phrase on my part) is really designed for 
> MapReduce codes - transactional codes. So if the person's code(s) fit 
> this model, then it might be an interesting experiment to try. 
> Otherwise, there are much better file systems for HPC :)
> 
> BTW - I saw Karen's post about using Java with HadoopFS. Be sure to pay 
> attention to that since getting a good 64-bit Java implementation for 
> Linux is not always easy. There are a few out there (Sun has an early 
> access program to a 64-bit Java) but the reports I've heard are that 
> it's still early.
> 
> Hope this helps.
> 
> Jeff
> 
> 
> ------------------------------------------------------------------------
> *From:* Gerry Creager <gerry.creager at tamu.edu>
> *To:* Beowulf Mailing List <beowulf at beowulf.org>
> *Sent:* Friday, December 26, 2008 6:16:04 PM
> *Subject:* [Beowulf] Hadoop
> 
> The subject line says it all: Hadoop:  Anyone got any experience with it
> on clusters (OK, so Google does, but that really wasn't the question,
> was it?).
> 
> We've a user who has requested its installation on one of our clusters,
> a high-throughput system.  I'm a bit concerned that it's not gonna be
> real compatible with, say, Torque/Maui and Gluster, unless we were to
> install Xen across the whole cluster and instantiate it within Xen VMs.
> 
> However, before I push all MY fears out into the discussion I'd prefer
> to see if anyone else has experience and can shed light on compatibility.
> 
> Thanks, Gerry
> -- 
> Gerry Creager -- gerry.creager at tamu.edu <mailto:gerry.creager at tamu.edu>
> Texas Mesonet -- AATLT, Texas A&M University   
> Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org <mailto:Beowulf at beowulf.org>
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843