[Beowulf] Really efficient MPIs??

Wed Nov 28 08:21:02 PST 2007

I've not tried their respective MPI libraries, but as a general rule, the
people who manufacture the chips have the best idea of how to optimize a
given library.  (There are obvious counter-examples, gotoBLAS and fftw for
example).

That said, have you tried for Intel:
http://www.intel.com/cd/software/products/asmo-na/eng/308295.htm

or for AMD:  http://developer.amd.com/devtools.jsp (they link to HP's MPI)

As a side note, IBM uses a slightly modified version of MPICH for Blue Gene.

Nathan

On Nov 28, 2007 9:48 AM, Christian Bell <christian.bell at qlogic.com> wrote:

> But the main point with MPI implementations, more than usual with
> shared memory, is to run your application.
>
> For 2 different MPI shared-memory implementations that show equal
> performance on point-to-point microbenchmarks, you can measure very
> different performance in applications (mostly at the bandwidth-bound
> level).
>
> Microbenchmarks assume senders and receivers are always synchronized
> in time and report memory copy performance for memory copies that go
> mostly through the cache.  Memory transfers that are mostly out of
> cache are rarely tuned for or even measured.
>
> Microbenchmarks also never have the receivers actually consume the
> data that's received or have senders re-reference the data sent for
> computation.  The cost of these application-level memory accesses is
> greatly determined by where in the memory hierarchy the MPI
> implementation left the data to be computed on.  And finally, a given
> implementation will have very different performance characteristics
> on Opteron versus Intel, few-core versus many-core and point-to-point
> versus collectives.
>
> It's safe to assume that most if not all MPIs try to do something
> about shared memory but I wouldn't be surprised if each of them can
> top out on some performance curve on some specific system.
>
>
>    . . christian
>
> On Wed, 28 Nov 2007, amjad ali wrote:
>
> > Hello,
> >
> > Because today the clusters with multicore nodes are quite common and the
> > cores within a node share memory.
> >
> > Which Implementations of MPI (no matter commercial or free), make
> automatic
> > and efficient use of shared memory for message passing within a node.
> (means
> > which MPI librarries auomatically communicate over shared memory instead
> of
> > interconnect on the same node).
> >
> > regards,
> > Ali.
>
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
> --
> christian.bell at qlogic.com
> (QLogic Host Solutions Group, formerly Pathscale)
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
- - - - - - -   - - - - - - -   - - - - - - -
Nathan Moore
Assistant Professor, Physics
Winona State University
AIM: nmoorewsu
- - - - - - -   - - - - - - -   - - - - - - -
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20071128/6a4e45bc/attachment.html>