Take any two: motherboard performance, compatibility, value

Adam Lazur alazur at plogic.com
Fri Jun 30 08:30:46 PDT 2000

Lechner, David (Lechner at drs-esg.com) said:
> Do I have it right in that the STREAM functions used in the benchmark are
> really better suited for a single processor, and thus when the OS tries to
> allocate and divide the work then extra work has to be done coordinating the
> work (looks like a 10% hit beyond linear division of the bandwidth by 2) ? 
> Do I interpret the timing results to see that the single processor function
> is much faster than the dual processor functionfor the same work?
> Is this because of the nature of the data set (such as very large matrix
> manipulation) that the second processor is not only unable to help at all
> but actually makes things twice as worse by getting involved?
> Or is it because 2 versions are running in parallel, getting into trouble,
> and then taking a little longer than 2x as long to do 2x as much work?

I believe it's due to the latter. When there are two copies of stream
running (see shell hack details below) they are competing for memory
bandwidth. The dualproc numbers are approximately the same for both copies
of stream.

> (Adam/Doug - Were you using an auto-parallelizing compiler for the dual
> processor version? Or the hack suggested to run 2 jobs at once?)

We used a shell hack (similar to the one suggested on the stream web page)
to run two jobs at once. I'd be very interested in running a threaded
version if anyone has a copy.

Oh, I've been told I should have included that we compiled stream_wall
from source with egcs-2.91.66 using the -O2 option.


[                  Adam Lazur <alazur at plogic.com>                     ]
[      Paralogic Inc. - www.plogic.com - www.xtreme-machines.com      ]

More information about the Beowulf mailing list