[Beowulf] Using Autoparallel compilers or Multi-Threaded libraries with MPI

Fri Nov 30 06:49:41 PST 2007

> Many problems decompose well in large chunks that are well done
> with MPI on different nodes, and tight loops that are best done
> locally with shared memory processors.

I think the argument against this approach is more based on practice
than principles.  hybrid parallelism certainly is possible, and in 
the most abstract sense makes sense.

however, it means a lot of extra work, and it's not clear how much benefit.
if you had an existing code or library which very efficiently used threads
to handle some big chunk of data, it might be quite simple to add some MPI
to handle big chunks in aggregate.  but I imagine that would most likley
happen if you're already data-parallel, which also means embarassingly
parallel.  for instance, apply this convolution to this giant image - 
it would work, but it's also not very interesting (ie, you ask "then what
happens to the image?  and how much time did we spend getting it distributed
and collecting the results?")

for more realistic cases, something like an AMR code, I suspect the code 
would wind up being structured as a thread dedicated to inter data movement,
interfacing with a thread task queue to deal with the irregularity of intra
computations.  that's a reasonably complicated piece of code I guess,
and before undertaking it, you need to ask whether a simpler model of 
just one-mpi-worker-per-processor would get you similar speed but with
less effort.  consider, as well, that if you go to a work-queue for handing
bundles of work to threads, you're already doing a kind of message passing.

if we've learned anything about programming, I think it's that simpler
mental models are to be desired.  not that MPI is ideal, of course!
just that it's simpler than MPI+OpenMP.

-mark hahn