gromacs benchmark and quad Xeons
rbw at ahpcrc.org
Wed Jun 19 08:19:30 PDT 2002
On Wed, 19 Jun 2002 01:31:18 -0400, Velocet wrote:
>On Tue, Jun 18, 2002 at 04:28:04PM -0500, Richard Walsh's all...
>> On Tue, Jun 18, 2002 at 02:14:02PM -0400, Velocet wrote:
>> > What's wrong here? Any ideas?
>> My first reaction is to wonder what your memory foot print is ...
>> the bandwidth of all quad's I have seen (except the ES-45 with
>> Compaq's typhoon chipset and cross-bar) has been lousy. If you
>> are not running mostly/completely in cache this could be an
>> issue. Competition from all four CPUs for bandwidth could kill
>> Have your tried running on one CPU? Can you shrink the test
>> case to make sure it is cache and then look at the performance?
>Well, we went and recompiled everything by hand with all proper options
>and got it to run.
>We did some quick tests on the d.dppc benchmark with only 500 steps ('cuz
>we dont have all night :)
>we were seeing this:
> scaling vs 1cpu scaling vs 2cpu
>1 cpu 94 ps/day 94 / cpu 100%
>2 cpu 144 ps/day 72 / cpu 77% 100%
>4 cpu 228 ps/day 57 / cpu 61% 79%
>the scaling is about 75-80% of half as many cpus (per cpu). Quite odd. Im sure
>we must have something misconfigured, or, as you say, it could be that the
>memory bandwidth is being thrashed. 94 ps/day for a single cpu is already
>phenomenal - my 1.33Ghz Tbirds were giving me 63 or so on my Tyan 2466s and
>60.4 on the PcChips M817 LMR. 94 is 1.5x faster for a mere 1.2 increase in
>clock. (1.6 Ghz Xeons here, with their sweet 256/512/1M l1/2/3 caches)
On the single CPU test all (or most) of the bandwidth on the board is
available to the single CPU making the job as little memory bound as
possible. You are not seeing a cache affect super-linear speed up here.
This is perhaps because the 512 MByte cache on the Prestonia is large
enough to hold things on the single processor test. The fact that
the you are seeing the same percentage degradation when you go from
2 to 4 CPUs suggests something in addition/besides bandwidth issues.
>For comparison, with dual CPU tyan 2466s and 1.333 Ghz Tbirds (test setup to
>see if tbirds worked on 2460s/66s) with Ns83820 GBE (direct connect, no
>switch) we saw:
>1 cpu 63 ps/day 63 / cpu 100%
>2 cpu 179 ps/day 90 / cpu 141% (superlinear!) 100%
>4 cpu 310 ps/day 78 / cpu 123% (still super) 87%
>So perhaps thats why.
Here the last 2 of 4 processors are on another board ... so you get
a linear increase in bandwidth which supports the better actual
performance. Molecular dynamics codes tend not to scale to well
and are latency sensitive.
>Here's the machine: (1.6 Ghz xeons, 8 gb ram)
I will take a look at it.
More information about the Beowulf