<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>I'm not an expert on Big Data at all, but I hear the phrase

      "Hadoop" less and less these days. Where I work, most data

      analysts are using R, Python, or Spark in the form of PySpark. For

      machine learning, most of the researchers I support are using

      Python tools like TensorFlow or PyTorch. <br>

    </p>

    <p>I don't know much about Julia replacing MPI, etc., but I wish I

      did. I would like to know more about Julia. <br>

    </p>

    <p>Prentice<br>

    </p>

    <div class="moz-cite-prefix">On 10/12/20 12:14 PM, Oddo Da wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CALFK+OaCvNq7txN6OnkAQz5Xv3VrDSDrEpkU2UgmOOkqMH6jCg@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div>Hello,</div>

        <div><br>

        </div>

        <div>I used to be in HPC back when we built beowulf clusters by

          hand ;) and wrote code in C/pthreads, PVM and MPI and back

          when anyone could walk into fields like bioinformatics, all

          that was needed was a pulse, some C and Perl and a desire to

          do ;-). Then I left for the private sector and stumbled into

          "big data" some years later - I wrote a lot of code in Spark

          and Scala, worked in infrastructure to support it etc.</div>

        <div><br>

        </div>

        <div>Then I went back (in 2017) to HPC. I was surprised to find

          that not much has changed - researchers and grad students

          still write code in MPI and C/C++ and maybe some Python or R

          for visualization or localized data analytics. I also noticed

          that it was not easy to "marry" things like big data with HPC

          clusters - tools like Spark/Hadoop do not really have the same

          underlying infrastructure assumptions as do things like

          MPI/supercomputers. However, I find it wasteful for a

          university to run separate clusters to support a data

          science/big data load vs traditional HPC.<br>

        </div>

        <div><br>

        </div>

        <div>I then stumbled upon languages like Julia - I like its

          approach, code is data, visualization is easy, decent ML/DS

          tooling. <br>

        </div>

        <div><br>

        </div>

        <div>How does it fare on a traditional HCP cluster? Are people

          using it to substitute their MPI loads? On the opposite side,

          has it caught up to Spark in terms of DS/ML quality of

          offering? In other words, can it be used as a one fell swoop

          unifying substitute for both opposing approaches? <br>

        </div>

        <div><br>

        </div>

        <div>I realize that many people have already committed to

          certain tech/paradigms but this is mostly educational debt (if

          MPI or Spark on the other side is working for me, why go to

          something different?) - but is there anything substantial

          stopping new people with no debt starting out in a different

          approach (offerings like Julia)?</div>

        <div><br>

        </div>

        <div>I do not have too much experience with Julia (and hence may

          be barking at the wrong tree) - in that case I am wondering

          what people are doing to "marry" the loads of traditional HPC

          with "big data" as practiced by the commercial/industry

          entities on a single underlying hardware offering. I know

          there are things like Twister2 but it is unclear to me (from

          cursory examination) what it actually offers in the context of

          my questions above.<br>

        </div>

        <div><br>

        </div>

        <div> Any input, corrections, schooling me etc. are appreciated.</div>

        <div><br>

        </div>

        <div>Thank you!<br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing

To change your subscription (digest mode or unsubscribe) visit <a class="moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a>

</pre>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Prentice Bisbal

Lead Software Engineer

Research Computing

Princeton Plasma Physics Laboratory

<a class="moz-txt-link-freetext" href="http://www.pppl.gov">http://www.pppl.gov</a></pre>

  </body>

</html>