[Beowulf] Maximizing intra-node communication performance
Joe Landman
landman at scalableinformatics.com
Wed Dec 28 20:11:57 PST 2005
Hi Tahir:
Tahir Malas wrote:
> Hi all,
> Taking advice from a previous discussion, we have purchased an Tyan server
> with 8 dual-core Opteron 870 processors. Now I want to wonder how I can
> maximize the intra-node communication of the server. We have been using
By maximize, do you mean maximizing bandwidth? Minimizing latency? Both?
> LAM-MPI, but I think that TCP/IP protocol may degrade the performance.
In mpich 1.2.x using the ch_p4 device, I am not sure if it will
automatically use shared memory for MPI processes running on the same
machine. I suspect not. I have used ch_shmem with such units with some
success, though you have to start worrying about contention for shared
memory arenas in a quad system when you are using a shared memory
device. Also, you need to make sure that memorys and processes are
pinned to the appropriate cpu (affinity scheduling using numactl and
other bits).
> Has
> anybody tried new implementations of MPI, or anybody knows some other
> support for intra-node communication?
With mpich 1.2.x you could use ch_shmem. I have run into some
performance issues with this in the recent past, where an 8 way run on a
dual core quad unit using mpich and the ch_shmem device was not as fast
similar runs using other mpi stacks (mpich-ib, mpich-gm). I have done
some very recent work with mpi and compiler bits from Pathscale for the
LAMMPS code (molecular dynamics) which have shown excellent scalability
per node and across nodes.
I have not been successful to date getting LAMMPS to run with LAM. LAM
7.x offers (IMO) some nice features/functionality relative to mpich 1.2.x .
The issues in running on large NUMA systems are significant. For large
shared memory units with lots of memory controllers, you need to worry
about first touch (usually more so with OpenMP) allocations. You really
don't want lots of other things to get in the way of your performance,
so time spent traversing a network stack is to be avoided. A good MPI
implmentation is in order.
If you will only run on individual nodes and never across nodes, OpenMP
can be quite powerful. Mixed model (MPI across nodes, OpenMP on each
node) is somewhat harder to do.
Joe
> Thanks in advance,
> Tahir Malas
> Bilkent University
> Electrical and Electronics Engineering Department
> Phone: +90 312 290 1385
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list