<div dir="ltr">Justin,<div>I'm unsure just what you mean by some of what you said.</div><div><br></div><div>"<span style="font-size:12.8px">Any fixed program-processor binding is a single point failure"</span></div><div><span style="font-size:12.8px">I'm troubled by the word "any". What about running two copies of a program, each with its own copy of the same data, on two processors (e.g. on a Tandem machine)? Surely that is not a single point of failure; is it not a "fixed program-processor binding"?</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">"...</span><span style="font-size:12.8px"> </span><span style="font-size:12.8px">it is impossible to implement reliable communication in the face of..."</span></div><div><span style="font-size:12.8px">If by "reliable" you mean "perfectly reliable" then the thesis is trivial and does not require proof. Reliability is a metrical value with costs; the cost is space (e.g. for error-correcting codes) or time (e.g. for re-transmissions) or whatever. Do you mean that MPI is cost-ineffective in proportion to reliability? If so, why?</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Thanks,</span></div><div><span style="font-size:12.8px">Peter</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 6, 2016 at 11:10 AM, Justin Y. Shi <span dir="ltr"><<a href="mailto:shi@temple.edu" target="_blank">shi@temple.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Actually my interest in your group is not much between "hate" and "love" of MPI or any other APIs. I am more interested in the "correctness" of parallel APIs.<div><br></div><div>Three decades ago, not doing "bare metal" computing was impossible for effective parallel processing. Today, insisting on "bare metal" computing is detrimental to extreme scale efforts.</div><div><br></div><div>Any fixed program-processor binding is a single point failure. The problem only shows when the application scales. And it is impossible to implement reliable communication in the face of crashes [Alan Fekete, Nancy Lynch and John Spinelli's 93 JACM paper proved this theoretically]. Therefore, any direct program-program communication API are theoretically incorrect for extreme scaling applications. </div><div><br></div><div>The <key, value> pair API seems the only theoretically correct parallel programming API that can take us out of the abyss of impossibilities. However, systems like Hadoop and Spark have only showed the great promises of program-device decoupling, they were not really designed for tackling HPC applications. And their decoupling is incomplete by their runtime implementations.</div><div><br></div><div>I proposed a Statistic Multiplexed Computing idea leveraging the successes of <key, value> api systems and old Tuple Space semantics. My github contribution is called Synergy3.0+. You are welcome to check it out and do a "bare metal" comparison against MPI and any other.</div><div><br></div><div>Our latest development is AnkaCom that was designed to tackling data intensive HPC without scaling limits.</div><div><br></div><div>My apologies in advance for my shameless self-advertising. I am looking for serious collaborators who are interested in breaking this decade-old barrier.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Justin Y. Shi</div><div><a href="mailto:shi@temple.edu" target="_blank">shi@temple.edu</a></div><div>SERC 310</div><div>Temple University</div><div><a href="tel:%2B1%28215%29204-6437" value="+12152046437" target="_blank">+1(215)204-6437</a><br><div><br></div><div><br></div></div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 4, 2016 at 10:14 AM, C Bergström <span dir="ltr"><<a href="mailto:cbergstrom@pathscale.com" target="_blank">cbergstrom@pathscale.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">A few people have subscribed and it's great to see some interest -<br>
hopefully we can start some interesting discussions. Actually - my<br>
background is more on the "web" side of HPC. I took a big jump when I<br>
started working @pathscale - Over the past 6 years I've cringed more<br>
than once when I see design that looks ***worse*** (I didn't think<br>
possible) than hibernate with tons of outer joins and evil xml<br>
configs.. (Java references for anyone unfortunate enough to get what<br>
I'm saying)<br>
<div><div><br>
<br>
<br>
<br>
On Fri, Mar 4, 2016 at 10:05 PM, Justin Y. Shi <<a href="mailto:shi@temple.edu" target="_blank">shi@temple.edu</a>> wrote:<br>
> Thank you for creating the list. I have subscribed.<br>
><br>
> Justin<br>
><br>
> On Fri, Mar 4, 2016 at 5:43 AM, C Bergström <<a href="mailto:cbergstrom@pathscale.com" target="_blank">cbergstrom@pathscale.com</a>><br>
> wrote:<br>
>><br>
>> Sorry for the shameless self indulgence, but there seems to be a<br>
>> growing trend of love/hate around MPI. I'll leave my opinions aside,<br>
>> but at the same time I'd love connect and host a list where others who<br>
>> are passionate about scalability can vent and openly discuss ideas.<br>
>><br>
>> Despite the comical name, I've created mpi-haters mailing list<br>
>> <a href="http://lists.pathscale.com/mailman/listinfo/mpi-haters_lists.pathscale.com" rel="noreferrer" target="_blank">http://lists.pathscale.com/mailman/listinfo/mpi-haters_lists.pathscale.com</a><br>
>><br>
>> To start things off - Some of the ideas I've been privately bouncing<br>
>> around<br>
>><br>
>> Can current directive based approaches (OMP/ACC) be extended to scale<br>
>> out. (I've seen some research out of Japan on this or similar)<br>
>><br>
>> Is Chapel c-like syntax similar enough to easily implement in clang<br>
>><br>
>> Can one low level library succeed at creating a clean interface across<br>
>> all popular industry interconnects (libfabrics vs UCX)<br>
>><br>
>> Real world success or failure of "exascale" runtimes? (What's your<br>
>> experience - lets not pull any punches)<br>
>><br>
>> I won't claim to see ridiculous scalability in most web applications<br>
>> I've worked on, but they had so many tools available - Why have I<br>
>> never heard of memcache being used in a supercomputer and or why isn't<br>
>> sharding ever mentioned...<br>
>><br>
>> Everyone is welcome and lets keep it positive and fun - invite your<br>
>> friends<br>
>><br>
>><br>
>> ./C<br>
>><br>
>> ps - Apologies if you get this message more than once.<br>
>> _______________________________________________<br>
>> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
>> To change your subscription (digest mode or unsubscribe) visit<br>
>> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
><br>
><br>
</div></div></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
<br></blockquote></div><br></div>