[Beowulf] hadoop

Jonathan Aquilina jaquilina at eagleeyet.net
Sat Feb 7 09:20:02 PST 2015


Hey Douglas, 

Thanks for the information, what has me curious is if it can be used for
example in applications which dont involve large amounts of data. 

It would be great if you or anyone has any resources like ebooks are
useful websites to read up on it would be great if you could send them
reason being where I am working we deal with lots of live telemetry in
terms of positioning etc. and since we are going to be moving our system
away from windows to open source technologies such as angular.js for the
web site of our platform as well as mongodb and nodejs, we will be
implementing hadoop from amazon to take advantage of Amazon's elastic
map reduce. 

Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-07 17:33, Douglas Eadline wrote: 

> Jonathan
> I understand your confusion. Hadoop and Big Data have reached
> overused but not well understood status years ago.
> First, Hadoop started out at a MapReduce engine. This all
> changed with Hadoop V2 and YARN (Yet Another Resource Negotiator)
> Hadoop V2 can be considered a platform on which applications that need
> parallel access to large amounts of unstructured data (i.e. raw data not
> in a traditional database. It can also used with its own database HBase,
> which is based on Google Big Table.
> The idea is this, a "Hadoop" cluster has a large amount of storage
> using HDFS (or possibly another parallel filesystem) This is often referred
> to as the "Data Lake." Raw data is dumped in the lake. There is no
> ETL (Extract Transform and Load) step. Various Hadoop YARN frameworks use
> this data. YARN provides a very dynamic resource allocation model and the
> ability to provide data locality to your application (i.e. the traditional
> MapReduce idea was "move the computation to the data")
> Thus in a Hadoop V2 cluster you can have MapReduce applications (which
> support many of the the popular apps like Pig and Hive) It also supports
> Spark, Storm, Giraph and even MPI (not the most efficient but it works)
> There are many other applications being ported to YARN.
> Second, Big Data is usually defined by Volume, Velocity, and Variety.
> The definition seems to be what ever a vendor wants it to be, however.
> It reminds me of products that suddenly became "grid ready" in years past.
> Again such designations mean as much as "now works with binary data"
> Finally, if you are interested in Hadoop YARN you can check out the book
> "Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with
> Apache Hadoop 2" (I helped write it). There also many online resources.
> The first chapter of the book has the history of Hadoop as written by
> one of the developers. It is quite interested to read and helps dispel
> many of the Hadoop myths. You can read this chapter for free here:
> http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf [2]That is enough Hadoop for Saturday morning. Oh, and Hadoop clusters
> are not going to supplant your HPC cluster.
> --
> Doug
>> Can someone explain to me what exactly the purpose of hadoop is and what we mean when we say big data? Is this for data storage and retrieval? Number crunching? -- Regards, Jonathan Aquilina Founder Eagle Eye T -- Mailscanner: Clean _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf [1]
> --
> Doug

[1] http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150207/952d1ff6/attachment-0001.html>

More information about the Beowulf mailing list