[Beowulf] Really efficient MPIs??
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Ashley Pittman apittman at concurrent-thinking.comThu Nov 29 04:25:41 PST 2007
- Previous message: [Beowulf] Really efficient MPIs??
- Next message: [Beowulf] Software RAID?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 2007-11-28 at 10:21 -0600, Nathan Moore wrote: > I've not tried their respective MPI libraries, but as a general rule, > the people who manufacture the chips have the best idea of how to > optimize a given library. (There are obvious counter-examples, > gotoBLAS and fftw for example). > > That said, have you tried for Intel: > http://www.intel.com/cd/software/products/asmo-na/eng/308295.htm > > or for AMD: http://developer.amd.com/devtools.jsp (they link to HP's > MPI) I'd disagree with this, the metric for MPI which is normally considered most important is network latency so you are often better of using the MPI provided with your network, where the chip vendors have experience and are able to provide benefit is in a fast memcpy() implementation which is what will limit bandwidth for intra-node comms so you should compile your MPI library with the chip-vendor approved compiler. As for shared memory comms the problem of using shared memory within a node is that there are explicitly two copies of the data required, one copy-in and one copy-out, the MPI's with the best intra-node bandwidth will be the ones which use a in-kernel userspace to userspace memcpy (which is *very* well optimised) to half the amount of memory traffic needed to perform the data move. Finally, just to really throw you I've seen occasions where *not* using intra-node optimisations at all is the right thing to do, communicating within a node consumes CPU cycles and if your code is overlapping comms and compute to such an extent that latency is not a large factor handing the comms of to the nic to be handled asynchronously without CPU intervention can improve performance. Ashley,
- Previous message: [Beowulf] Really efficient MPIs??
- Next message: [Beowulf] Software RAID?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
