LAM SMP performance
Josip Loncaric
josip at icase.edu
Fri Dec 8 12:18:50 PST 2000
Patrick Geoffray wrote:
>
> On Fri, 8 Dec 2000, Josip Loncaric wrote:
>
> > memory performance using usysv transport on our SMP boxes is about as
> > good as the hardware can deliver (1 microsecond latency, 266 Mbyte/s
> > peak bandwidth). MPICH shared memory performance is not as good (16
> > microsecond latency, 235 Mbyte/s peak bandwidth). On the minus side,
>
> I am very surprised by the SMP performance. 1 us is very very (too) low,
> it's the cost a of a system call. usysv uses SYS V semaphores, and I don't
> think it's possible to reach this level of latency with them.
I believe that you are thinking of sysv (semaphores). LAM compiled with
usysv uses spinlocks, and the peak 266 Mbyte/s bandwidth is reached for
8KB cache-to-cache copies. Memory gets involved only for larger message
sizes, and then the bandwidth drops to 127 Mbyte/s. See my raw data
(NPmpi from netpipe-2.3) at:
http://www.icase.edu/~josip/phase23-64-TCP-lam/NPmpi.out
http://www.icase.edu/~josip/phase23-64-TCP-mpich/NPmpi.out
and the summary of my findings at
http://www.icase.edu/~josip/MPIonCoral.html
Also, I'm told that LAM reaches similar shared memory performance levels
on Suns (Solaris).
Important: LAM with usysv (spinlocks) works great, but performance can
drop by a factor of 100,000 if more than one process per CPU is
started. If you must use more than one process per CPU, compile LAM
with sysv (semaphores) instead. Benchmark your code and pick the best
library for the job...
Sincerely,
Josip
--
Dr. Josip Loncaric, Senior Staff Scientist mailto:josip at icase.edu
ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
More information about the Beowulf
mailing list