[Beowulf] are there any known attempts to apply hadoop BigData techniques to weather modelling?
Prentice Bisbal
prentice.bisbal at rutgers.edu
Tue Feb 17 14:39:38 PST 2015
On 02/17/2015 05:16 PM, Ellis H. Wilson III wrote:
> On 02/17/2015 04:56 PM, Prentice Bisbal wrote:
>> Why do you think 'Big Data' techniques would be applicable to this?
>>
>> A large amount of data != big data.
>
> Heh. Let's not pretend like 'big data' means anything of substance
> now :D.
>
>> 'Big Data' techniques are typically for finding trends in unstructured
>> data from multiple sources, whereas the output of scientific simulations
>> is usually from a single source in some sort of structured format. I
>> just don't see any applicability here whatsoever.
>
> I would argue this is perhaps a bit overly specific. This might be
> the typical use case, but certainly there is no reason why Hadoop and
> MapReduce couldn't be used to do simple filtering of scientific
> simulation output. If you were looking for places in a huge output
> file where temperature is between some set of ranges and elevation
> also had a specific value, I could certainly see value in applying an
> easily programmable scaling framework to basically "smart grep"
> through your data. Hadoop/MR could certainly help you do that.
I was intentionally being specific. Trying to correct all the lack of
specificity surrounding the term 'Big Data'. ;)
>
> Many output formats for scientific data are well-structured as you
> mentioned however, such as HDF5. This doesn't mean you have a good
> file system or good parallel programming paradigm to do stupid-simple
> things with this afterwards. You just have a good container format.
> Hadoop could provide the other bits you need. A paper from the HDF5
> group actually does a decent job of pointing out these kinds of
> differences, how you might get HDF5 containers in and out of HDFS and
> what impacts performance:
>
> http://www.hdfgroup.org/HDF5/faq/hadoop.html
>
> As they note in the paper, a recent work (I was lucky enough to talk
> in the same slot as the author at SC a year back) called SciHadoop
> works directly with NetCDF formatted files, so that could be another
> option. Whether or not the source is available for SciHadoop is beyond
> my knowledge, but a quick google would likely give you that answer.
>
> If you are asking, "should I do weather simulation using Hadoop or
> some other big data framework," my answer is a resounding NO. There
> are VERY different (often far more limited) semantics and guarantees
> in MR than other parallel programming paradigms, and you will almost
> certainly get burned if you try to shove a climate-shaped peg through
> the square hole that is MR. This is probably what Prentice was
> getting at.
That's EXACTLY what I was getting at. A hammer is a good tool for
nailing pieces of wood together, but I wouldn't use it to cut down a tree.
Prentice
More information about the Beowulf
mailing list