Fortran compilers for Linux/mpich

Sun Nov 25 08:32:25 PST 2001

On Fri, 23 Nov 2001, Don Holmgren wrote:

> At the very bottom of the page,
>    http://qcdhome.fnal.gov/sse/
> I have a table with cycle counts posted for a number of matrix-matrix
> and matrix-vector routines as measured on a P-III (Coppermine), P4, and
> an Athlon MP.  Times are posted for both a pure-C version of each
> routine, built with gcc, as well as for an SSE version.  The sources
> for each are available at
>    http://qcdhome.fnal.gov/sse/catalog.html
> 
> The results are a mixed bag, with each flavor processor sometimes first,
> second, or third.  I'm using only a small subset of SSE - mostly shufps,
> addps, mulps, with a few xops, movaps, and movups thrown in.  I haven't
> timed individual instructions on all three processors.
> 
> Don Holmgren
> Fermilab

Awesomely useful, Don, thanks.

Do you have any idea what the overall marginal benefit is of using your
hand-optimized routines when working on large datasets (too big to fit
into cache)?  In particular, does performance devolve to
memory-bandwidth-bound behavior (and hence end up being the same for
MILC and SSE and dominated by the memory bus speed)?

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu