<div dir="auto">What does your overall design look like?</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 4, 2019, 5:19 AM Jonathan Aquilina <<a href="mailto:jaquilina@eagleeyet.net">jaquilina@eagleeyet.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Michael,<br>
<br>
As previously mentioned we don’t really need to have anything indexed so I am thinking flat files are the way to go my only concern is the performance of large flat files. Isnt that what HDFS is for to deal with large flat files.<br>
<br>
On 04/03/2019, 14:13, "Beowulf on behalf of Michael Di Domenico" <<a href="mailto:beowulf-bounces@beowulf.org" target="_blank" rel="noreferrer">beowulf-bounces@beowulf.org</a> on behalf of <a href="mailto:mdidomenico4@gmail.com" target="_blank" rel="noreferrer">mdidomenico4@gmail.com</a>> wrote:<br>
<br>
even though you've alluded to this being time series data. is there a<br>
requirement that you have to index into the data or is just read the<br>
data end-to-end and do some calculations.<br>
<br>
i routinely face these kind of issues, but we're not indexing into the<br>
data, so having things in hdfs or rdbms doesn't give us any benefit.<br>
we pull all the data into organized flat files and blow through them<br>
with HTCondor. if the researcher wants to tweak the code they do and<br>
then just rerun the whole simulation.<br>
<br>
sometimes that's minutes sometimes days. but in either case the time<br>
to develop code is always much shorter because the data is in flat<br>
files and easier for my "non-programmer" programmers. no need to<br>
learn hdfs/hadoop or sql<br>
<br>
if you need to index the data and jump around, hdfs is probably still<br>
not the best solution unless you want index the files and 250gb isn't<br>
really big enough to warrant an hdfs cluster. i've generally found<br>
unless you're dealing with multi-TB+ datasets you can't scale the<br>
hardware out enough to get the speed up. (yes, i know there are<br>
tweaks to change this, but I've found its just simpler to buy a bigger<br>
lustre system)<br>
<br>
<br>
<br>
On Mon, Mar 4, 2019 at 1:39 AM Jonathan Aquilina<br>
<<a href="mailto:jaquilina@eagleeyet.net" target="_blank" rel="noreferrer">jaquilina@eagleeyet.net</a>> wrote:<br>
><br>
> Good Morning all,<br>
><br>
><br>
><br>
> I am working on a project that I sadly cant go into much detail but there will be quite large amounts of data that will be ingested by this system and would need to be efficiently returned as output to the end user in around 10 min or so. I am in discussions with another partner involved in this project about the best way forward on this.<br>
><br>
><br>
><br>
> For me given the amount of data (and it is a huge amount of data) that an RDBMS such as postgresql would be a major bottle neck. Another thing that was considered flat files, and I think the best for that would be a Hadoop cluster with HDFS. But in the case of HPC how can such an environment help in terms of ingesting and analytics of large amounts of data? Would said flat files of data be put on a SAN/NAS or something and through an NFS share accessed that way for computational purposes?<br>
><br>
><br>
><br>
> Regards,<br>
><br>
> Jonathan<br>
><br>
> _______________________________________________<br>
> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank" rel="noreferrer">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
> To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank" rel="noreferrer">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
<br>
<br>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank" rel="noreferrer">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
</blockquote></div>