Performance Variations using MPI/Myrico

Steffen Persvold sp at
Fri Apr 27 09:02:38 PDT 2001

Patrick Geoffray wrote:
> Steffen Persvold wrote:
> > Hmm, the NAS application runs in userspace and since this inner loop
> > (FFT code) runs without any communication with other nodes, why would a
> > SSE patched kernel improve it's memcpy performance. I would believe that
> > the memcpy calls in the FFT code was either inlined by the compiler, or
> > that a call to libc's memcpy was made. It shouldn't involve any
> > system (kernel) time at all, right ??
> Hi Steffen,
> Yes, the NAS FT code does not use the "memcpy()" system call. The copy
> step of the FFT is explicit (loop of assignments) and the PGI compiler
> is smart enough to use SSE prefetching to optimize this part of the code
> if SSE is available. But without a specific patch, the Linux kernel does
> not enable the SSE support (basically the kernel has to save the FP and
> the SSE registers during context switching), so the SSE optimization for
> PIII from PGI is useless. Now I am wondering if compiling with
> -Mvect=sse or -Mvect=prefetch with pgf90 WITHOUT the SSE support enabled
> in the kernel is not the source of this unstability.

Actually, running SSE code (involving any SSE "mov" instructions) on a kernel
wich doesn't save the SSE registers between context switches would result in a
segmentation fault.....

I have learned this the hard way :

The original RH6.2 kernel (2.2.14-5.0) had PIII support and therefore saving of
SSE registers, but when RH released a kernel update because they experienced
data loss during context switches (RHBA-2000:013-01), I upgraded to
2.2.14-6.0.1. This kernel however did not have SSE support enabled, and my hand
coded SSE routines suddenly caused a segmentation fault.

There are however some SSE instructions that doesn't require a context switch
save of registers (i.e "sfence" and "prefetchnta")

> Anyway, 50 % of variation for a pure computation piece of code seems too
> large to be explained by the SSE support. SSE on PIII is single
> precision only, so it does not help to get more Flops. Maybe there is
> something else in the patch that they applied, I will look at it.
I agree.

 Steffen Persvold                        Systems Engineer
 Email  : mailto:sp at            Scali AS (
 Norway : Tel  : (+47) 2262 8950         Olaf Helsets vei 6
          Fax  : (+47) 2262 8951         N-0621 Oslo, Norway

 USA    : Tel  : (+1) 713 706 0544       10500 Richmond Avenue, Suite 190
                                         Houston, Texas 77042, USA

More information about the Beowulf mailing list