[Beowulf] hadoop

Douglas Eadline deadline at eadline.org
Sat Feb 7 10:38:36 PST 2015


> Hello Jonathan.
> Here it is a good document to get you thinking.
> http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf
>
> Although Doug said "Oh, and Hadoop clusters are not going to supplant your
> HPC
> cluster"

I should have continued, ... and there will be overlap.

--
Doug


> I believe that there is an ongoing effort to converge Cloud computing (eg.
> Hadoop) and HPC.
> The key things are exposed in the link I provided.
> To me the convergence is summarized in:
> -strong scalability.
> -reliability/fault tolerance.
> -programming productivity.
> -standarized/cheap infrastructure.
>
> Joshua
>
> ------ Original Message ------
> Received: 09:20 AM PST, 02/07/2015
> From: Jonathan Aquilina <jaquilina at eagleeyet.net>
> To: Douglas Eadline <deadline at eadline.org>Cc: Beowulf
> <beowulf at beowulf.org>
> Subject: Re: [Beowulf] hadoop
>
>>
>>
>> Hey Douglas,
>>
>> Thanks for the information, what has me curious is if it can be used for
>> example in applications which dont involve large amounts of data.
>>
>> It would be great if you or anyone has any resources like ebooks are
>> useful websites to read up on it would be great if you could send them
>> reason being where I am working we deal with lots of live telemetry in
>> terms of positioning etc. and since we are going to be moving our system
>> away from windows to open source technologies such as angular.js for the
>> web site of our platform as well as mongodb and nodejs, we will be
>> implementing hadoop from amazon to take advantage of Amazon's elastic
>> map reduce.
>>
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>
>> On 2015-02-07 17:33, Douglas Eadline wrote:
>>
>> > Jonathan
>> >
>> > I understand your confusion. Hadoop and Big Data have reached
>> > overused but not well understood status years ago.
>> >
>> > First, Hadoop started out at a MapReduce engine. This all
>> > changed with Hadoop V2 and YARN (Yet Another Resource Negotiator)
>> > Hadoop V2 can be considered a platform on which applications that need
>> > parallel access to large amounts of unstructured data (i.e. raw data
>> not
>> > in a traditional database. It can also used with its own database
>> HBase,
>> > which is based on Google Big Table.
>> >
>> > The idea is this, a "Hadoop" cluster has a large amount of storage
>> > using HDFS (or possibly another parallel filesystem) This is often
> referred
>> > to as the "Data Lake." Raw data is dumped in the lake. There is no
>> > ETL (Extract Transform and Load) step. Various Hadoop YARN frameworks
>> use
>> > this data. YARN provides a very dynamic resource allocation model and
>> the
>> > ability to provide data locality to your application (i.e. the
> traditional
>> > MapReduce idea was "move the computation to the data")
>> >
>> > Thus in a Hadoop V2 cluster you can have MapReduce applications (which
>> > support many of the the popular apps like Pig and Hive) It also
>> supports
>> > Spark, Storm, Giraph and even MPI (not the most efficient but it
>> works)
>> > There are many other applications being ported to YARN.
>> >
>> > Second, Big Data is usually defined by Volume, Velocity, and Variety.
>> > The definition seems to be what ever a vendor wants it to be, however.
>> > It reminds me of products that suddenly became "grid ready" in years
> past.
>> > Again such designations mean as much as "now works with binary data"
>> >
>> > Finally, if you are interested in Hadoop YARN you can check out the
>> book
>> > "Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with
>> > Apache Hadoop 2" (I helped write it). There also many online
>> resources.
>> > The first chapter of the book has the history of Hadoop as written by
>> > one of the developers. It is quite interested to read and helps dispel
>> > many of the Hadoop myths. You can read this chapter for free here:
>> >
>> >
> http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf
> [2]That is enough Hadoop for Saturday morning. Oh, and Hadoop clusters
>> > are not going to supplant your HPC cluster.
>> >
>> > --
>> > Doug
>> >
>> >> Can someone explain to me what exactly the purpose of hadoop is and
>> what
> we mean when we say big data? Is this for data storage and retrieval?
> Number
> crunching? -- Regards, Jonathan Aquilina Founder Eagle Eye T --
> Mailscanner:
> Clean _______________________________________________ Beowulf mailing
> list,
> Beowulf at beowulf.org sponsored by Penguin Computing To change your
> subscription
> (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf [1]
>> >
>> > --
>> > Doug
>>
>>
>> Links:
>> ------
>> [1] http://www.beowulf.org/mailman/listinfo/beowulf
>> [2]
>>
> http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf
>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
>
> --
> Mailscanner: Clean
>
>


--
Doug

-- 
Mailscanner: Clean



More information about the Beowulf mailing list