[Beowulf] Large amounts of data to store and process

Fred Youhanaie fly at anydata.co.uk
Mon Mar 4 01:18:32 PST 2019


Hi Jonathan,

It seems you're collecting metrics and time series data. Perhaps a time series database (TSDB) is an option for you. There are a few of these out there, but I don't have any personal recommendation.

Cheers,
Fred

On 04/03/2019 07:04, Jonathan Aquilina wrote:
> These would be numerical data such as integers or floating point numbers.
> 
> -----Original Message-----
> From: Tony Brian Albers <tba at kb.dk>
> Sent: 04 March 2019 08:04
> To: beowulf at beowulf.org; Jonathan Aquilina <jaquilina at eagleeyet.net>
> Subject: Re: [Beowulf] Large amounts of data to store and process
> 
> Hi Jonathan,
> 
>  From my limited knowledge of the technologies, I would say that HBase with file pointers to the files placed on HDFS would suit you well.
> 
> But if the files are log files, consider some tools that are suited for analyzing those like Kibana.
> 
> /tony
> 
> 
> On Mon, 2019-03-04 at 06:55 +0000, Jonathan Aquilina wrote:
>> Hi Tony,
>>
>> Sadly I cant go into much detail due to me being under an NDA. At this
>> point with the prototype we have around 250gb of sample data but again
>> this data is dependent on the type of air craft. Larger aircraft and
>> longer flights will generate a lot more data as they have  more
>> sensors and will log more data than the sample data that I have. The
>> sample data is 250gb for 35 aircraft of the same type.
>>
>> Regards,
>> Jonathan
>>
>> -----Original Message-----
>> From: Tony Brian Albers <tba at kb.dk>
>> Sent: 04 March 2019 07:48
>> To: beowulf at beowulf.org; Jonathan Aquilina <jaquilina at eagleeyet.net>
>> Subject: Re: [Beowulf] Large amounts of data to store and process
>>
>> On Mon, 2019-03-04 at 06:38 +0000, Jonathan Aquilina wrote:
>>> Good Morning all,
>>>
>>> I am working on a project that I sadly cant go into much detail but
>>> there will be quite large amounts of data that will be ingested by
>>> this system and would need to be efficiently returned as output to
>>> the end user in around 10 min or so. I am in discussions with
>>> another partner involved in this project about the best way forward
>>> on this.
>>>
>>> For me given the amount of data (and it is a huge amount of data)
>>> that an RDBMS such as postgresql would be a major bottle neck.
>>> Another thing that was considered flat files, and I think the best
>>> for that would be a Hadoop cluster with HDFS. But in the case of HPC
>>> how can such an environment help in terms of ingesting and analytics
>>> of large amounts of data? Would said flat files of data be put on a
>>> SAN/NAS or something and through an NFS share accessed that way for
>>> computational purposes?
>>>
>>> Regards,
>>> Jonathan
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing To change your subscription (digest mode or unsubscribe)
>>> visit http:/ /www.beowulf.org/mailman/listinfo/beowulf
>>
>> Good morning,
>>
>> Around here, we're using HBase for similar purposes. We have a bunch
>> of smaller nodes storing the data and all the management nodes(ambari,
>> HDFS namenodes etc.) are vm's.
>>
>> Our nodes are configured so that we have a maximum of 2 cores per disk
>> spindle and 4G of memory for each core. This seems to do the trick and
>> is pretty responsive.
>>
>> But to be able to provide better advice, you will probably need to go
>> into a bit more detail about what types of data you will be storing
>> and which kind of calculations you want to perform.
>>
>> /tony
>>
>>
>> --
>> Tony Albers - Systems Architect - IT Development Royal Danish Library,
>> Victor Albecks Vej 1, 8000 Aarhus C, Denmark
>> Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142
> 
> --
> Tony Albers - Systems Architect - IT Development Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark
> Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


More information about the Beowulf mailing list