[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

Oddo Da oddodaoddo at gmail.com
Tue Oct 13 15:29:16 PDT 2020


Doug, thank you for taking the time! Your Julia comments are in line with
my impression of it, hence the initial question I posed in this thread.
Thank you for all your insights.

On Tue, Oct 13, 2020 at 5:03 PM Douglas Eadline <deadline at eadline.org>
wrote:

>
> > On Tue, Oct 13, 2020 at 3:54 PM Douglas Eadline <deadline at eadline.org>
> > wrote:
> >
> >>
> >> It really depends on what you need to do with Hadoop or Spark.
> >> IMO many organizations don't have enough data to justify
> >> standing up a 16-24 node cluster system with a PB of HDFS.
> >>
> >
> > Excellent. If I understand what you are saying, there is simply no demand
> > to mix technologies, esp. in the academic world. OK. In your opinion and
> > independent of Spark/HDFS discussion, why are we still only on openMPI in
> > the world of writing distributed code on HPC clusters? Why is there
> > nothing
> > else gaining any significant traction? No innovation in exposing higher
> > level abstractions and hiding the details and making it easier to write
> > correct code that is easier to reason about and does not burden the
> writer
> > with too much of a low level detail. Is it just the amount of investment
> > in
> > an existing knowledge base? Is it that there is nothing out there to
> > compel
> > people to spend the time on it to learn it? Or is there nothing there? Or
> > maybe there is and I am just blissfully unaware? :)
> >
>
>
> I have been involved in HPC and parallel computing since the 1980's
> Prior to MPI every vendor had a message passing library. Initially
> PVM (Parallel Virtual Machine) from Oak Ridge was developed so there
> would be some standard API to create parallel codes. It worked well
> but needed more. MPI was developed so parallel hardware vendors
> (not many back then) could standardize on a messaging framework
> for HPC. Since then, not a lot has pushed the needle forward.
>
> Of course there are things like OpenMP, but these are not distributed
> tools.
>
> Another issue the difference between "concurrent code" and
> parallel execution. Not everything that is concurrent needs
> to be executed in parallel and indeed, depending on
> the hardware environment you are targeting, these decisions
> may change. And, it is not something you can figure out by
> looking at the code.
> P
> arallel computing is hard problem and no one has
> really come up with a general purpose way to write software.
> MPI works, however I still consider it a "parallel machine code"
> that requires some careful programming.
>
> The good news is most of the popular HPC applications
> have been ported and will run using MPI (as best as their algorithm
> allows) So from an end user perspective, most everything
> works. Of course there could be more applications ported
> to MPI but it all depends. Maybe end users can get enough
> performance with a CUDA version and some GPUs or an
> OpenMP version on a 64-core server.
>
> Thus the incentive is not really there. There is no huge financial
> push behind HPC software tools like there is with data analytics.
>
> Personally, I like Julia and believe it is the best new language
> to enter technical computing. One of the issues it addresses is
> the two language problem. The first cut of something is often written
> in Python, then if it get to production and is slow and does
> not have an easy parallel pathway (local multi-core or distributed)
> Then the code is rewritten in C/C++ or Fortran with MPI, CUDA, OpenMP
>
> Julia is fast out the box and provides a growth path for
> parallel growth. One version with no need to rewrite.  Plus,
> it has something called "multiple dispatch" that provides
> unprecedented code flexibility and portability. (too long a
> discussion for this email) Basically it keeps the end user closer
> to their "problem" and further away from the hardware minutia.
>
> That is enough for now. I'm sure others have opinions worth
> hearing.
>
>
> --
> Doug
>
>
>
> > Thanks!
> >
>
>
> --
> Doug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201013/da0181f9/attachment-0001.html>


More information about the Beowulf mailing list