[Beowulf] hadoop

Joshua Mora joshua_mora at usa.net
Sat Feb 7 10:10:31 PST 2015


Hello Jonathan.
Here it is a good document to get you thinking.
http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf

Although Doug said "Oh, and Hadoop clusters are not going to supplant your HPC
cluster"
I believe that there is an ongoing effort to converge Cloud computing (eg.
Hadoop) and HPC.
The key things are exposed in the link I provided.
To me the convergence is summarized in:
-strong scalability.
-reliability/fault tolerance.
-programming productivity.
-standarized/cheap infrastructure.

Joshua

------ Original Message ------
Received: 09:20 AM PST, 02/07/2015
From: Jonathan Aquilina <jaquilina at eagleeyet.net>
To: Douglas Eadline <deadline at eadline.org>Cc: Beowulf <beowulf at beowulf.org>
Subject: Re: [Beowulf] hadoop

>  
> 
> Hey Douglas, 
> 
> Thanks for the information, what has me curious is if it can be used for
> example in applications which dont involve large amounts of data. 
> 
> It would be great if you or anyone has any resources like ebooks are
> useful websites to read up on it would be great if you could send them
> reason being where I am working we deal with lots of live telemetry in
> terms of positioning etc. and since we are going to be moving our system
> away from windows to open source technologies such as angular.js for the
> web site of our platform as well as mongodb and nodejs, we will be
> implementing hadoop from amazon to take advantage of Amazon's elastic
> map reduce. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-07 17:33, Douglas Eadline wrote: 
> 
> > Jonathan
> > 
> > I understand your confusion. Hadoop and Big Data have reached
> > overused but not well understood status years ago.
> > 
> > First, Hadoop started out at a MapReduce engine. This all
> > changed with Hadoop V2 and YARN (Yet Another Resource Negotiator)
> > Hadoop V2 can be considered a platform on which applications that need
> > parallel access to large amounts of unstructured data (i.e. raw data not
> > in a traditional database. It can also used with its own database HBase,
> > which is based on Google Big Table.
> > 
> > The idea is this, a "Hadoop" cluster has a large amount of storage
> > using HDFS (or possibly another parallel filesystem) This is often
referred
> > to as the "Data Lake." Raw data is dumped in the lake. There is no
> > ETL (Extract Transform and Load) step. Various Hadoop YARN frameworks use
> > this data. YARN provides a very dynamic resource allocation model and the
> > ability to provide data locality to your application (i.e. the
traditional
> > MapReduce idea was "move the computation to the data")
> > 
> > Thus in a Hadoop V2 cluster you can have MapReduce applications (which
> > support many of the the popular apps like Pig and Hive) It also supports
> > Spark, Storm, Giraph and even MPI (not the most efficient but it works)
> > There are many other applications being ported to YARN.
> > 
> > Second, Big Data is usually defined by Volume, Velocity, and Variety.
> > The definition seems to be what ever a vendor wants it to be, however.
> > It reminds me of products that suddenly became "grid ready" in years
past.
> > Again such designations mean as much as "now works with binary data"
> > 
> > Finally, if you are interested in Hadoop YARN you can check out the book
> > "Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with
> > Apache Hadoop 2" (I helped write it). There also many online resources.
> > The first chapter of the book has the history of Hadoop as written by
> > one of the developers. It is quite interested to read and helps dispel
> > many of the Hadoop myths. You can read this chapter for free here:
> > 
> >
http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf
[2]That is enough Hadoop for Saturday morning. Oh, and Hadoop clusters
> > are not going to supplant your HPC cluster.
> > 
> > --
> > Doug
> > 
> >> Can someone explain to me what exactly the purpose of hadoop is and what
we mean when we say big data? Is this for data storage and retrieval? Number
crunching? -- Regards, Jonathan Aquilina Founder Eagle Eye T -- Mailscanner:
Clean _______________________________________________ Beowulf mailing list,
Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription
(digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf [1]
> > 
> > --
> > Doug
>  
> 
> Links:
> ------
> [1] http://www.beowulf.org/mailman/listinfo/beowulf
> [2]
>
http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
> 




More information about the Beowulf mailing list