rather unfortunate article on Mac
W Bauske
wsb at paralleldata.com
Fri Feb 1 12:09:06 PST 2002
Bill Broadley wrote:
>
> Just figured after hearing the 15 Gflop number I'd do a reality check
> with stream. I happen to have a dual g4-800 around, so I ran stream:
>
> copy scale add triad
> cc -O1 313 307 341 342
> cc -O2 319 306 341 342
> cc -03 321 307 341 342
> cc -O4 319 305 341 342
>
> I happen to have a 1.2 Ghz athlon (pretty slow for these days) on
> a $65 motherboard:
> gcc -O1 677 660 760 680
>
> At this rate the "15 GFlop" g4 can add 2 arrays at 28 Mflops single
> prevision, or 14 Mflops double precision. About 1/2 of a low end
> budget athlon.
>
That's what I meant about using Altivec with a compiler. Your test likely
did not use that part of the chip. There are prefetch operations that
can speed up memory access substantially that go unused. All I've seen
though indicates you have to use either macros or assembler to get to
them. (On Linux at least) Similar to SSE/SSE2 before Intel's compilers were
available. (GCC now too I think)
Wes
More information about the Beowulf
mailing list