Take any two: motherboard performance, compatibility, value

Lechner, David Lechner at drs-esg.com
Fri Jun 30 09:18:21 PDT 2000

So then the time is a bit more than 2 times as long for dual mode since each
single copy can only access memory at half the bandwidth - 
And while there are two processors, in the dual mode you ran there is also
twice the work (but only half the memory bandwidth and contention)- 
As I said before, very interesting...

Has anyone done this test with auto-parallelized version of STREAM doing one
work set on 1,2,3, and 4 processors on an SMP box with a single shared
memory bank, and the doing the same with multiple (N= # processors) copies
of the benchmark running concurrently (with contention)?

 Dave Lechner

-----Original Message-----
From: Adam Lazur [mailto:alazur at plogic.com]
Sent: Friday, June 30, 2000 11:31 AM
To: Lechner, David
Cc: Beowulf mailing list
Subject: Re: Take any two: motherboard performance, compatibility, value

Lechner, David (Lechner at drs-esg.com) said:
> Do I have it right in that the STREAM functions used in the benchmark are
> really better suited for a single processor, and thus when the OS tries to
> allocate and divide the work then extra work has to be done coordinating
> work (looks like a 10% hit beyond linear division of the bandwidth by 2) ?

> Do I interpret the timing results to see that the single processor
> is much faster than the dual processor functionfor the same work?
> Is this because of the nature of the data set (such as very large matrix
> manipulation) that the second processor is not only unable to help at all
> but actually makes things twice as worse by getting involved?
> Or is it because 2 versions are running in parallel, getting into trouble,
> and then taking a little longer than 2x as long to do 2x as much work?

I believe it's due to the latter. When there are two copies of stream
running (see shell hack details below) they are competing for memory
bandwidth. The dualproc numbers are approximately the same for both copies
of stream.

> (Adam/Doug - Were you using an auto-parallelizing compiler for the dual
> processor version? Or the hack suggested to run 2 jobs at once?)

We used a shell hack (similar to the one suggested on the stream web page)
to run two jobs at once. I'd be very interested in running a threaded
version if anyone has a copy.

Oh, I've been told I should have included that we compiled stream_wall
from source with egcs-2.91.66 using the -O2 option.


[                  Adam Lazur <alazur at plogic.com>                     ]
[      Paralogic Inc. - www.plogic.com - www.xtreme-machines.com      ]

Beowulf mailing list
Beowulf at beowulf.org

More information about the Beowulf mailing list