Performance Variations using MPI/Myrico

Steffen Persvold sp at
Fri Apr 27 07:42:55 PDT 2001

Patrick Geoffray wrote:
> > 3) Any ideas on what could cause this much variation?
> I have some ideas, but nothing I would bet on. Mainly cache trashing : the
> memory copy operation is improved with SSE by using the prefecthing
> support, and this prefetch bypass the L2 cache. Without SSE, the L2 cache
> is happilly flushed as a processor is doing a copy. As the FFT code
> include a copy step, who knows... :-)

Hmm, the NAS application runs in userspace and since this inner loop (FFT code) runs without any communication with
other nodes, why would a SSE patched kernel improve it's memcpy performance. I would believe that the memcpy calls in
the FFT code was either inlined by the compiler, or that a call to libc's memcpy was made. It shouldn't involve any
system (kernel) time at all, right ?? 

 Steffen Persvold                        Systems Engineer
 Email  : mailto:sp at            Scali AS (
 Norway : Tlf  : (+47) 2262 8950         Olaf Helsets vei 6
          Fax  : (+47) 2262 8951         N-0621 Oslo, Norway

 USA    : Tlf  : (+1) 713 706 0544       10500 Richmond Avenue, Suite 190
                                         Houston, Texas 77042, USA

More information about the Beowulf mailing list