Performance Variations using MPI/Myrico
Steffen Persvold
sp at scali.no
Fri Apr 27 09:02:38 PDT 2001
Patrick Geoffray wrote:
>
> Steffen Persvold wrote:
>
> > Hmm, the NAS application runs in userspace and since this inner loop
> > (FFT code) runs without any communication with other nodes, why would a
> > SSE patched kernel improve it's memcpy performance. I would believe that
> > the memcpy calls in the FFT code was either inlined by the compiler, or
> > that a call to libc's memcpy was made. It shouldn't involve any
> > system (kernel) time at all, right ??
>
> Hi Steffen,
>
> Yes, the NAS FT code does not use the "memcpy()" system call. The copy
> step of the FFT is explicit (loop of assignments) and the PGI compiler
> is smart enough to use SSE prefetching to optimize this part of the code
> if SSE is available. But without a specific patch, the Linux kernel does
> not enable the SSE support (basically the kernel has to save the FP and
> the SSE registers during context switching), so the SSE optimization for
> PIII from PGI is useless. Now I am wondering if compiling with
> -Mvect=sse or -Mvect=prefetch with pgf90 WITHOUT the SSE support enabled
> in the kernel is not the source of this unstability.
Actually, running SSE code (involving any SSE "mov" instructions) on a kernel
wich doesn't save the SSE registers between context switches would result in a
segmentation fault.....
I have learned this the hard way :
The original RH6.2 kernel (2.2.14-5.0) had PIII support and therefore saving of
SSE registers, but when RH released a kernel update because they experienced
data loss during context switches (RHBA-2000:013-01), I upgraded to
2.2.14-6.0.1. This kernel however did not have SSE support enabled, and my hand
coded SSE routines suddenly caused a segmentation fault.
There are however some SSE instructions that doesn't require a context switch
save of registers (i.e "sfence" and "prefetchnta")
>
> Anyway, 50 % of variation for a pure computation piece of code seems too
> large to be explained by the SSE support. SSE on PIII is single
> precision only, so it does not help to get more Flops. Maybe there is
> something else in the patch that they applied, I will look at it.
>
I agree.
Regards,
--
Steffen Persvold Systems Engineer
Email : mailto:sp at scali.com Scali AS (http://www.scali.com)
Norway : Tel : (+47) 2262 8950 Olaf Helsets vei 6
Fax : (+47) 2262 8951 N-0621 Oslo, Norway
USA : Tel : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190
Houston, Texas 77042, USA
More information about the Beowulf
mailing list