gromacs benchmark and quad Xeons

Velocet math at
Tue Jun 18 11:14:02 PDT 2002

posting this to beowulf instead of gromacs because I figure there's more
knowledge about compilers/achitectures here.

I have access to a quad xeon via a friend at a large media company.
1.6Ghz per with 4 gigs of ram. Rather sweet, nothing else running on the
new install yet, so he can use my gromacs test as a benchmark.

He has limited time to waste on me however, so though I normally tweak
everything by hand when installing gromacs, I just installed the RPMs this
time (tho I did install the proper fftw-lam rpm instead of the non lam).

We got the job running in short order (about 15 minutes including finding
all RPM urls and installing them and typing on irc at my friend who knows
nothing about gromacs :) and the job is go.

However on 1.6Ghz quad xeon we're seeing times for the d.dppc job of
about 3-4 HOURS to completion. This is really slow, especially for 1.6Ghz
CPUs with such huge caches (David vanDerSpoel was suggesting the super
linear speedups I saw on dual Athlons was due to large caches and reduction
in thrashing over 1 cpu). Following this logic we should see some incredible
performance for 4x Xeons.

Are the original rpms for gromacs which are generally compiled for P2/MMX
going to be absolutely the worst possible situation for a Xeon? Or should
they work relatively near best-speed (minus missing SSE and SSE2 instructions,
though I dont even know if GCC gives the full set of either for a hand

What's wrong here? Any ideas?

Ken Chase, math at  *  Velocet Communications Inc.  *  Toronto, CANADA 

More information about the Beowulf mailing list