[Beowulf] are there any known attempts to apply hadoop BigData techniques to weather modelling?

Prentice Bisbal prentice.bisbal at rutgers.edu
Tue Feb 17 14:39:38 PST 2015


On 02/17/2015 05:16 PM, Ellis H. Wilson III wrote:
> On 02/17/2015 04:56 PM, Prentice Bisbal wrote:
>> Why do you think 'Big Data' techniques would be applicable to this?
>>
>> A large amount of data != big data.
>
> Heh.  Let's not pretend like 'big data' means anything of substance 
> now :D.
>
>> 'Big Data' techniques are typically for finding trends in unstructured
>> data from multiple sources, whereas the output of scientific simulations
>> is usually from a single source in some sort of structured format. I
>> just don't see any applicability here whatsoever.
>
> I would argue this is perhaps a bit overly specific.  This might be 
> the typical use case, but certainly there is no reason why Hadoop and 
> MapReduce couldn't be used to do simple filtering of scientific 
> simulation output.  If you were looking for places in a huge output 
> file where temperature is between some set of ranges and elevation 
> also had a specific value, I could certainly see value in applying an 
> easily programmable scaling framework to basically "smart grep" 
> through your data.  Hadoop/MR could certainly help you do that.
I was intentionally being specific. Trying to correct all the lack of 
specificity surrounding the term 'Big Data'. ;)
>
> Many output formats for scientific data are well-structured as you 
> mentioned however, such as HDF5.  This doesn't mean you have a good 
> file system or good parallel programming paradigm to do stupid-simple 
> things with this afterwards.  You just have a good container format.  
> Hadoop could provide the other bits you need. A paper from the HDF5 
> group actually does a decent job of pointing out these kinds of 
> differences, how you might get HDF5 containers in and out of HDFS and 
> what impacts performance:
>
> http://www.hdfgroup.org/HDF5/faq/hadoop.html
>
> As they note in the paper, a recent work (I was lucky enough to talk 
> in the same slot as the author at SC a year back) called SciHadoop 
> works directly with NetCDF formatted files, so that could be another 
> option. Whether or not the source is available for SciHadoop is beyond 
> my knowledge, but a quick google would likely give you that answer.
>
> If you are asking, "should I do weather simulation using Hadoop or 
> some other big data framework," my answer is a resounding NO. There 
> are VERY different (often far more limited) semantics and guarantees 
> in MR than other parallel programming paradigms, and you will almost 
> certainly get burned if you try to shove a climate-shaped peg through 
> the square hole that is MR.  This is probably what Prentice was 
> getting at.

That's EXACTLY what I was getting at. A hammer is a good tool for 
nailing pieces of wood together, but I wouldn't use it to cut down a tree.

Prentice



More information about the Beowulf mailing list