[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

Oddo Da oddodaoddo at gmail.com
Tue Oct 13 11:48:27 PDT 2020


On Tue, Oct 13, 2020 at 1:31 PM Douglas Eadline <deadline at eadline.org>
wrote:

>
> The reality is almost all Analytics projects require multiple
> tools. For instance, Spark is great, but if you do some
> data munging of CSV files and want to store your results
> at scale you can't write a single file to your local file
> system. Often times you write it as a Hive table to HDFS
> (e.g. in Parquet format) so it is available for Hive SQL
> queries or for other tools to use.
>

You can also commit to a database (but you can't have those running on a
traditional HPC cluster). What would be nice would be HDFS running on a
traditional cluster. But that would break the whole parallel filesystem
exposed as a single mount point thing.... It is funny how these things
evolved apart from each other to the point they are impossible to marry,
no?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201013/9f0064d7/attachment.html>


More information about the Beowulf mailing list