Performance Variations using MPI/Myrico
Steffen Persvold
sp at scali.no
Fri Apr 27 07:42:55 PDT 2001
Patrick Geoffray wrote:
> > 3) Any ideas on what could cause this much variation?
>
> I have some ideas, but nothing I would bet on. Mainly cache trashing : the
> memory copy operation is improved with SSE by using the prefecthing
> support, and this prefetch bypass the L2 cache. Without SSE, the L2 cache
> is happilly flushed as a processor is doing a copy. As the FFT code
> include a copy step, who knows... :-)
Hmm, the NAS application runs in userspace and since this inner loop (FFT code) runs without any communication with
other nodes, why would a SSE patched kernel improve it's memcpy performance. I would believe that the memcpy calls in
the FFT code was either inlined by the compiler, or that a call to libc's memcpy was made. It shouldn't involve any
system (kernel) time at all, right ??
Regards,
--
Steffen Persvold Systems Engineer
Email : mailto:sp at scali.com Scali AS (http://www.scali.com)
Norway : Tlf : (+47) 2262 8950 Olaf Helsets vei 6
Fax : (+47) 2262 8951 N-0621 Oslo, Norway
USA : Tlf : (+1) 713 706 0544 10500 Richmond Avenue, Suite 190
Houston, Texas 77042, USA
More information about the Beowulf
mailing list