[Beowulf] Large amounts of data to store and process

Sun Mar 3 22:38:28 PST 2019

Good Morning all,

I am working on a project that I sadly cant go into much detail but there will be quite large amounts of data that will be ingested by this system and would need to be efficiently returned as output to the end user in around 10 min or so. I am in discussions with another partner involved in this project about the best way forward on this.

For me given the amount of data (and it is a huge amount of data) that an RDBMS such as postgresql would be a major bottle neck. Another thing that was considered flat files, and I think the best for that would be a Hadoop cluster with HDFS. But in the case of HPC how can such an environment help in terms of ingesting and analytics of large amounts of data? Would said flat files of data be put on a SAN/NAS or something and through an NFS share accessed that way for computational purposes?

Regards,
Jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20190304/91544b0c/attachment.html>