Performance Variations using MPI/Myrico

Greg Lindahl
Fri Apr 27 02:41:59 PDT 2001

> I have some ideas, but nothing I would bet on. Mainly cache trashing : the
> memory copy operation is improved with SSE by using the prefecthing
> support, and this prefetch bypass the L2 cache. Without SSE, the L2 cache
> is happilly flushed as a processor is doing a copy. As the FFT code
> include a copy step, who knows... :-)


> Greg: your numbers for FT are on Alpha or x86 ?

x86, a dual PIII. I wasn't using "enterprise edition" but this was
long enough ago that it's hard to believe that whatever kernel I used
had any SSE accelleration. I think it was vanilla RH 6.2.

