[Beowulf] Large amounts of data to store and process

Tony Brian Albers tba at kb.dk
Sun Mar 3 23:04:20 PST 2019


Hi Jonathan,

From my limited knowledge of the technologies, I would say that HBase
with file pointers to the files placed on HDFS would suit you well.

But if the files are log files, consider some tools that are suited for
analyzing those like Kibana.

/tony


On Mon, 2019-03-04 at 06:55 +0000, Jonathan Aquilina wrote:
> Hi Tony,
> 
> Sadly I cant go into much detail due to me being under an NDA. At
> this point with the prototype we have around 250gb of sample data but
> again this data is dependent on the type of air craft. Larger
> aircraft and longer flights will generate a lot more data as they
> have  more sensors and will log more data than the sample data that I
> have. The sample data is 250gb for 35 aircraft of the same type.
> 
> Regards,
> Jonathan
> 
> -----Original Message-----
> From: Tony Brian Albers <tba at kb.dk> Sent: 04 March 2019 07:48
> To: beowulf at beowulf.org; Jonathan Aquilina <jaquilina at eagleeyet.net>
> Subject: Re: [Beowulf] Large amounts of data to store and process
> 
> On Mon, 2019-03-04 at 06:38 +0000, Jonathan Aquilina wrote:
> > Good Morning all,
> > 
> > I am working on a project that I sadly cant go into much detail
> > but 
> > there will be quite large amounts of data that will be ingested by 
> > this system and would need to be efficiently returned as output to
> > the 
> > end user in around 10 min or so. I am in discussions with another 
> > partner involved in this project about the best way forward on
> > this.
> > 
> > For me given the amount of data (and it is a huge amount of data)
> > that 
> > an RDBMS such as postgresql would be a major bottle neck.
> > Another thing that was considered flat files, and I think the best
> > for 
> > that would be a Hadoop cluster with HDFS. But in the case of HPC
> > how 
> > can such an environment help in terms of ingesting and analytics
> > of 
> > large amounts of data? Would said flat files of data be put on a 
> > SAN/NAS or something and through an NFS share accessed that way
> > for 
> > computational purposes?
> > 
> > Regards,
> > Jonathan
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
> > Computing To change your subscription (digest mode or unsubscribe) 
> > visit http:/ /www.beowulf.org/mailman/listinfo/beowulf
> 
> Good morning,
> 
> Around here, we're using HBase for similar purposes. We have a bunch
> of smaller nodes storing the data and all the management
> nodes(ambari, HDFS namenodes etc.) are vm's.
> 
> Our nodes are configured so that we have a maximum of 2 cores per
> disk spindle and 4G of memory for each core. This seems to do the
> trick and is pretty responsive.
> 
> But to be able to provide better advice, you will probably need to go
> into a bit more detail about what types of data you will be storing
> and which kind of calculations you want to perform.
> 
> /tony 
> 
> 
> --
> Tony Albers - Systems Architect - IT Development Royal Danish
> Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark
> Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142

-- 
Tony Albers - Systems Architect - IT Development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark
Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142


More information about the Beowulf mailing list