gromacs benchmark and quad Xeons

Richard Walsh rbw at ahpcrc.org
Wed Jun 19 08:19:30 PDT 2002


On Wed, 19 Jun 2002 01:31:18 -0400, Velocet wrote:

>On Tue, Jun 18, 2002 at 04:28:04PM -0500, Richard Walsh's all...
>> On Tue, Jun 18, 2002 at 02:14:02PM -0400, Velocet wrote:
>> ...
>> > What's wrong here? Any ideas?
>> 
>> My first reaction is to wonder what your memory foot print is ...
>> the bandwidth of all quad's I have seen (except the ES-45 with 
>> Compaq's typhoon chipset and cross-bar) has been lousy. If you
>> are not running mostly/completely in cache this could be an
>> issue.  Competition from all four CPUs for bandwidth could kill
>> performance. 
>> 
>> Have your tried running on one CPU? Can you shrink the test
>> case to make sure it is cache and then look at the performance?
>
>Well, we went and recompiled everything by hand with all proper options
>and got it to run.
>
>We did some quick tests on the d.dppc benchmark with only 500 steps ('cuz
>we dont have all night :)
>
>we were seeing this:
>                                   scaling vs 1cpu        scaling vs 2cpu
>1 cpu   94 ps/day       94 / cpu        100%                    
>2 cpu  144 ps/day       72 / cpu         77%                    100%
>4 cpu  228 ps/day       57 / cpu         61%                     79%
>
>the scaling is about 75-80% of half as many cpus (per cpu). Quite odd. Im sure
>we must have something misconfigured, or, as you say, it could be that the
>memory bandwidth is being thrashed. 94 ps/day for a single cpu is already
>phenomenal - my 1.33Ghz Tbirds were giving me 63 or so on my Tyan 2466s and
>60.4 on the PcChips M817 LMR. 94 is 1.5x faster for a mere 1.2 increase in
>clock. (1.6 Ghz Xeons here, with their sweet 256/512/1M l1/2/3 caches)

On the single CPU test all (or most) of the bandwidth on the board is 
available to the single CPU making the job as little memory bound as
possible. You are not seeing a cache affect super-linear speed up here.
This is perhaps because the 512 MByte cache on the Prestonia is large
enough to hold things on the single processor test. The fact that
the you are seeing the same percentage degradation when you go from
2 to 4 CPUs suggests something in addition/besides bandwidth issues. 

>
>For comparison, with dual CPU tyan 2466s and 1.333 Ghz Tbirds (test setup to
>see if tbirds worked on 2460s/66s) with Ns83820 GBE (direct connect, no
>switch) we saw:
>
>1 cpu    63 ps/day      63 / cpu        100%
>2 cpu   179 ps/day      90 / cpu        141% (superlinear!)     100%
>4 cpu   310 ps/day      78 / cpu        123% (still super)       87%
>
>So perhaps thats why.

Here the last 2 of 4 processors are on another board ... so you get 
a linear increase in bandwidth which supports the better actual 
performance. Molecular dynamics codes tend not to scale to well
and are latency sensitive.

>
>Here's the machine: (1.6 Ghz xeons, 8 gb ram)
>
>http://www.dell.com/us/en/esg/topics/esg_pedge_rackmain_servers_3_pedge_6650.htm

I will take a look at it.

rbw




More information about the Beowulf mailing list