LAM SMP performance

Fri Dec 8 12:18:50 PST 2000

Patrick Geoffray wrote:
> 
> On Fri, 8 Dec 2000, Josip Loncaric wrote:
> 
> > memory performance using usysv transport on our SMP boxes is about as
> > good as the hardware can deliver (1 microsecond latency, 266 Mbyte/s
> > peak bandwidth).  MPICH shared memory performance is not as good (16
> > microsecond latency, 235 Mbyte/s peak bandwidth).  On the minus side,
> 
> I am very surprised by the SMP performance. 1 us is very very (too) low,
> it's the cost a of a system call. usysv uses SYS V semaphores, and I don't
> think it's possible to reach this level of latency with them.

I believe that you are thinking of sysv (semaphores).  LAM compiled with
usysv uses spinlocks, and the peak 266 Mbyte/s bandwidth is reached for
8KB cache-to-cache copies.  Memory gets involved only for larger message
sizes, and then the bandwidth drops to 127 Mbyte/s.  See my raw data
(NPmpi from netpipe-2.3) at:

http://www.icase.edu/~josip/phase23-64-TCP-lam/NPmpi.out
http://www.icase.edu/~josip/phase23-64-TCP-mpich/NPmpi.out

and the summary of my findings at

http://www.icase.edu/~josip/MPIonCoral.html

Also, I'm told that LAM reaches similar shared memory performance levels
on Suns (Solaris).

Important: LAM with usysv (spinlocks) works great, but performance can
drop by a factor of 100,000 if more than one process per CPU is
started.  If you must use more than one process per CPU, compile LAM
with sysv (semaphores) instead.  Benchmark your code and pick the best
library for the job...

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Senior Staff Scientist        mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134