[Beowulf] Using Autoparallel compilers or Multi-Threaded libraries with MPI

Fri Nov 30 07:59:22 PST 2007

IMHO the hybris approach (MPI+threads) is interesting in case every
MPI-process has lots of local data.

If you have a cluster of quad-cores, you might either have one process per
node with each process using 4 threads or put one mpi-process per core. The
latter is simpler because it only requires MPI-parallelism but if the code
is memory-bound and every mpi-process has much of the same data, it will be
better to share this common data with all processes on the same cpu and thus
use threads intra-node.

toon

On 11/30/07, Mark Hahn <hahn at mcmaster.ca> wrote:
>
> > Many problems decompose well in large chunks that are well done
> > with MPI on different nodes, and tight loops that are best done
> > locally with shared memory processors.
>
> I think the argument against this approach is more based on practice
> than principles.  hybrid parallelism certainly is possible, and in
> the most abstract sense makes sense.
>
> however, it means a lot of extra work, and it's not clear how much
> benefit.
> if you had an existing code or library which very efficiently used threads
> to handle some big chunk of data, it might be quite simple to add some MPI
> to handle big chunks in aggregate.  but I imagine that would most likley
> happen if you're already data-parallel, which also means embarassingly
> parallel.  for instance, apply this convolution to this giant image -
> it would work, but it's also not very interesting (ie, you ask "then what
> happens to the image?  and how much time did we spend getting it
> distributed
> and collecting the results?")
>
> for more realistic cases, something like an AMR code, I suspect the code
> would wind up being structured as a thread dedicated to inter data
> movement,
> interfacing with a thread task queue to deal with the irregularity of
> intra
> computations.  that's a reasonably complicated piece of code I guess,
> and before undertaking it, you need to ask whether a simpler model of
> just one-mpi-worker-per-processor would get you similar speed but with
> less effort.  consider, as well, that if you go to a work-queue for
> handing
> bundles of work to threads, you're already doing a kind of message
> passing.
>
> if we've learned anything about programming, I think it's that simpler
> mental models are to be desired.  not that MPI is ideal, of course!
> just that it's simpler than MPI+OpenMP.
>
> -mark hahn
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20071130/dfac5722/attachment.html>