<html><body>
<DIV> </DIV>
<BLOCKQUOTE style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">
<P>-------------- Original message -------------- <BR>From: Håkon Bugge <Hakon.Bugge@scali.com> <BR><BR>> I have a slightly different view. Hybrid <BR>> programming is used for performance reasons, but <BR>> only in cases where parallelization (to the same <BR>> level) is impossible/impractical using the pure <BR>> MPI mode, or the parallelization yields low <BR>> efficiency. So, if you're able to achieve your <BR>> performance with MPI, you probably will. But <BR>> there are cases where you cannot; a) the <BR>> "decomposition parallel efficiency" is not good <BR>> enough or b) the processes need a huge (shared) table. <BR></P>
<P>I think that what is being said here is that applications may be decomposible in some number of dimensions, but not so in all. If the benefits in performance in locally managing the "unruly" dimensions are great enough, then a hybrid program may be worth the trouble. I think that the number of real-world apps in this class is perhaps not large, or there would be more hybrid code. </P>
<P>Another perhaps relavent alternative that will at some point be able to take on both the partionable and unpartionable extreme cases and everything in between are the PGAS language extensions (UPC and CAF). Not yet at distributed-memory, performance-parity with well-coded MPI, but with, arguably, an intrinsic programmability advantage in LOC and in data structure coverage. AMR codes tracking shedding vortices are inherently non-partionable (or in need of regular repartitioning). Managing then in either MPI or OpenMP in a distributed memory environment is a chore.</P>
<P>And if you believe that ... ;-) ... then there is of course the "magic" of many-threaded latency hiding (can't say I am a true believer for the data intensive OZ of HPC). Some would have you believe that a 32 thread, 8 core Niagara 2 (or perhaps a future design at some higher active thread to core ratio) can hide all your data latency events behind its active thread horizon. </P>
<P>Maybe the key is to combine PGAS with many-threads ... mmm ... anyone doing this?</P>
<P>;-) </P>
<P>rbw</P>
<P>-- <BR><BR>"Making predictions is hard, especially about the future." <BR><BR>Niels Bohr <BR><BR>-- <BR><BR>Richard Walsh <BR>Thrashing River Consulting-- <BR>5605 Alameda St. <BR>Shoreview, MN 55126 <BR><BR>Phone #: 612-382-4620</P></BLOCKQUOTE></body></html>