gerry.creager at tamu.edu
Sat Dec 27 07:59:54 PST 2008
I'm an old, guy and don't mind top-posts!
Thanks for the insight!
Jeff Layton wrote:
> Sorry for top-posting (I hate these on-line email tools...)
> Did the person requesting Hadoop ever say why they wanted it? For
> example, do they have code written in MapReduce or do they think that
> Hadoop will give them faster throughput than something else?
> Hadoop is a project that really has 2 parts to it - an open-source
> MapReduce implementation, and a file system. From people I've talked to,
> the MapReduce part is used far more than the file system. But I've
> talked to some of the developers of the file system and there are some
> people who use the file system.
> In general the file system is basically a virtual file system ala' PVFS,
> GlusterFS or any object based storage (Panasas, Lustre). However it
> understand the idea of locality - that is where useful storage is in
> relation to the compute part of the problem. The idea being that you can
> reduce the time to transmit the data because the storage is closer. But,
> in general, the improvement you get is due to the network topology, not
> necessarily the file system itself. That's because, in general,
> MapReduce systems have network topologies with bottlenecks all over the
> place because they don't really need a full bi-sectional bandwidth
> network everywhere. So for example they may have good bandwidth to a
> switch within the rack, but outside the rack, they bandwidth is not so
> hot. But again, these are generalizations, and the details are always in
> the implementation.
> HadoopFS (lack of a better phrase on my part) is really designed for
> MapReduce codes - transactional codes. So if the person's code(s) fit
> this model, then it might be an interesting experiment to try.
> Otherwise, there are much better file systems for HPC :)
> BTW - I saw Karen's post about using Java with HadoopFS. Be sure to pay
> attention to that since getting a good 64-bit Java implementation for
> Linux is not always easy. There are a few out there (Sun has an early
> access program to a 64-bit Java) but the reports I've heard are that
> it's still early.
> Hope this helps.
> *From:* Gerry Creager <gerry.creager at tamu.edu>
> *To:* Beowulf Mailing List <beowulf at beowulf.org>
> *Sent:* Friday, December 26, 2008 6:16:04 PM
> *Subject:* [Beowulf] Hadoop
> The subject line says it all: Hadoop: Anyone got any experience with it
> on clusters (OK, so Google does, but that really wasn't the question,
> was it?).
> We've a user who has requested its installation on one of our clusters,
> a high-throughput system. I'm a bit concerned that it's not gonna be
> real compatible with, say, Torque/Maui and Gluster, unless we were to
> install Xen across the whole cluster and instantiate it within Xen VMs.
> However, before I push all MY fears out into the discussion I'd prefer
> to see if anyone else has experience and can shed light on compatibility.
> Thanks, Gerry
> Gerry Creager -- gerry.creager at tamu.edu <mailto:gerry.creager at tamu.edu>
> Texas Mesonet -- AATLT, Texas A&M University
> Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
> Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
> Beowulf mailing list, Beowulf at beowulf.org <mailto:Beowulf at beowulf.org>
> To change your subscription (digest mode or unsubscribe) visit
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Beowulf