<div dir="ltr">Michael, thank you, you have given me quite a lot to think about.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 14, 2020 at 2:28 PM Michael Di Domenico <<a href="mailto:mdidomenico4@gmail.com">mdidomenico4@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, Oct 14, 2020 at 2:07 PM Oddo Da <<a href="mailto:oddodaoddo@gmail.com" target="_blank">oddodaoddo@gmail.com</a>> wrote:<br>
><br>
> You stated that Spark/Hadoop approach can code for everything that MPI can code for and vice versa. If this is all true and it is that easy, nobody would have "invented" them since we already had MPI/C/C++ to solve all our problems ;-).<br>
<br>
i'm not sure i meant as pointedly as you have stated it here. this is<br>
the difference between whether something can be and whether it should.<br>
yes you can solve dense linear algebra on a Spark cluster, but you<br>
shouldn't<br>
<br>
> I disagree. I think yes, there is old code that does not churn but there are always new people/grad students coming into the field. They too are being pointed in the same direction of how to do things, which is what we are discussing here ;-)<br>
<br>
I'm not sure I agree. I interact with a LOT of post-docs, many have<br>
no idea what MPI is yet alone how to use it. but i'm not entirely in<br>
academia so i can't say that for certain<br>
<br>
> It seems that in your world nothing new ever gets written? You are talking only about re-writes ;).<br>
<br>
not entirely. you're making my point a little more pointed then i<br>
intended. but if you look at the big traditional heavy hpc code, i<br>
think you'll find "re-writes" are uncommon. but if you parallel the<br>
"cloud" world, re-writing the entire code base of some module because<br>
it's tuesday happens more often then it should<br>
<br>
> This is probably true. What is the rest of the 80% of the load in your HPC world?<br>
<br>
we run the gambit of stuff, everything from ML frameworks to user code<br>
C/python/etc to stuff like magma and matlab<br>
<br>
> Programming languages are a part of it and I have said this before - languages like Julia can incorporate MPI as an underlying (or one of underlying) mechanisms/libraries to distribute computation. I have nothing against MPI (as I have stated before). I have something - curiosity - about what is holding a field in a certain state. Spark is a framework but I think it is much more than MPI, by the way - as it is both a way to distribute computation, but there is also lazy evaluation, resilient datasets, Scala, functional programming etc.<br>
<br>
but see you're comparing three entirely different things, Spark =<br>
framework, Scala = language, MPI = library. If you wanted to compare<br>
Spark to HPC, there's probably a parallel application but i can't<br>
think of one off the top of my head.<br>
<br>
i think the stuck state you're interpreting is a misrepresentation<br>
that HPC is full of stodgy greybeards who only want to run MPI code<br>
written in 1970's fortran. i don't think that's the case anymore.<br>
HPC has branched out and includes a lot of ancillary paths, but it<br>
still holds onto its heritage, which is something I appreciate. HPC<br>
has never been about flash, it's about solving the world's hardest<br>
problems. You don't always need a porsche, sometimes a yugo works<br>
just as well<br>
</blockquote></div>