[Beowulf] [External] Spark, Julia, OpenMPI etc. - all in one place
Prentice Bisbal
pbisbal at pppl.gov
Mon Oct 12 12:19:17 PDT 2020
I'm not an expert on Big Data at all, but I hear the phrase "Hadoop"
less and less these days. Where I work, most data analysts are using R,
Python, or Spark in the form of PySpark. For machine learning, most of
the researchers I support are using Python tools like TensorFlow or
PyTorch.
I don't know much about Julia replacing MPI, etc., but I wish I did. I
would like to know more about Julia.
Prentice
On 10/12/20 12:14 PM, Oddo Da wrote:
> Hello,
>
> I used to be in HPC back when we built beowulf clusters by hand ;) and
> wrote code in C/pthreads, PVM and MPI and back when anyone could walk
> into fields like bioinformatics, all that was needed was a pulse, some
> C and Perl and a desire to do ;-). Then I left for the private sector
> and stumbled into "big data" some years later - I wrote a lot of code
> in Spark and Scala, worked in infrastructure to support it etc.
>
> Then I went back (in 2017) to HPC. I was surprised to find that not
> much has changed - researchers and grad students still write code in
> MPI and C/C++ and maybe some Python or R for visualization or
> localized data analytics. I also noticed that it was not easy to
> "marry" things like big data with HPC clusters - tools like
> Spark/Hadoop do not really have the same underlying infrastructure
> assumptions as do things like MPI/supercomputers. However, I find it
> wasteful for a university to run separate clusters to support a data
> science/big data load vs traditional HPC.
>
> I then stumbled upon languages like Julia - I like its approach, code
> is data, visualization is easy, decent ML/DS tooling.
>
> How does it fare on a traditional HCP cluster? Are people using it to
> substitute their MPI loads? On the opposite side, has it caught up to
> Spark in terms of DS/ML quality of offering? In other words, can it be
> used as a one fell swoop unifying substitute for both opposing
> approaches?
>
> I realize that many people have already committed to certain
> tech/paradigms but this is mostly educational debt (if MPI or Spark on
> the other side is working for me, why go to something different?) -
> but is there anything substantial stopping new people with no debt
> starting out in a different approach (offerings like Julia)?
>
> I do not have too much experience with Julia (and hence may be barking
> at the wrong tree) - in that case I am wondering what people are doing
> to "marry" the loads of traditional HPC with "big data" as practiced
> by the commercial/industry entities on a single underlying hardware
> offering. I know there are things like Twister2 but it is unclear to
> me (from cursory examination) what it actually offers in the context
> of my questions above.
>
> Any input, corrections, schooling me etc. are appreciated.
>
> Thank you!
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
--
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201012/5b12c3a5/attachment.html>
More information about the Beowulf
mailing list