[Beowulf] multi-threading vs. MPI

Jim Lux James.P.Lux at jpl.nasa.gov
Tue Dec 11 10:30:25 PST 2007


At 05:17 AM 12/11/2007, Douglas Eadline wrote:
>This is indeed the issue. Where to invest time?
>
>My opinion, and it is only my opinion, is the following.
>Please share your own.
>
>Threaded approaches do not scale across clusters. The memory
>architecture of multi-core is making nodes look more like
>small clusters i.e. memory is becoming more localized.
>As Don Becker mentioned in a recent post, efforts to program
>distributed memory like it were shared memory often end
>up looking like stylized message passing systems.
>
>One other thing about messages. The problem of
>trying to optimize the compute to communication issue is
>easier than trying to optimize the compute to locality
>issue.
>
>Therefore, if I were to start a new parallel project of some sort
>or parallelize an existing code, I would use MPI. Although
>OpenMP might get me up and running quicker, I would feel more
>comfortable with a problem cast in MPI.
>
>I'm interested in others opinions on this because, I think it
>is an important issue for the general programing audience
>and not just us cluster geeks. The difference is we have had
>a lot more time and experience with this stuff.
>
>--
>Doug


Another huge advantage of going to a message passing paradigm is that 
it forces you to explicitly deal with the time synchronization (or 
lack thereof) among processes in that an underlying assumption is 
that passing the message takes non-zero time.   Therefore, in any 
message passing system, there's not necessarily any concept of 
"absolute time" among all processes.  (You have to pass time 
messages, just like any other).

As the propagation delay (light time) among processors gets to be a 
significant fraction of the message length, this is a bigger and bigger deal.

For myself, this is an issue because I work with systems that are 
distributed over huge distances (where light time is seconds or 
minutes and it varies), but it also applies on a finer grain where 
you have delays in the communications paths in the 
microseconds/milliseconds scale, especially if they are variable and 
non-deterministic.  (NTP, for instance, assumes that the delays are 
deterministic in the long term sense, even if there's a lot of short 
term variability)

Jim Lux






More information about the Beowulf mailing list