[Beowulf] Opinions of Hyper-threading?
Bill Broadley
bill at cse.ucdavis.edu
Thu Feb 28 12:22:58 PST 2008
Mattijs Janssens wrote:
> How do your Rate numbers correlate to the max bandwitdh of 32GB/s
> (http://en.wikipedia.org/wiki/GeForce_8_Series)?
>
> Or do these threads all operate on the same data?
My first guess was some kind of caching, after all 2M floats is only 8MB. But
I couldn't reproduct it on my 8600GT so I'm guessing it's a timing issue.
I downloaded the source, compiled:
/usr/local/cuda/bin/nvcc -O3 -o stream stream.cu
Ran it:
./stream
STREAM Benchmark implementation in CUDA
Array size (single precision)=2000000
using 128 threads per block, 15625 blocks
Function Rate (MB/s) Avg time Min time Max time
Copy: 16596.1294 0.0010 0.0010 0.0010
Scale: 16581.7649 0.0010 0.0010 0.0010
Add: 18750.8822 0.0013 0.0013 0.0013
Triad: 18736.6081 0.0013 0.0013 0.0013
I maade the array 4 times bigger:
STREAM Benchmark implementation in CUDA
Array size (single precision)=8000000
using 128 threads per block, 62500 blocks
Function Rate (MB/s) Avg time Min time Max time
Copy: 16706.3212 0.0039 0.0038 0.0044
Scale: 16666.2770 0.0046 0.0038 0.0100
Add: 18408.0866 0.0053 0.0052 0.0056
Triad: 18738.6603 0.0052 0.0051 0.0055
Stream numbers that are 50% of marketing numbers seem relatively common.
I'm not that familiar with CUDA, this ran on a video card that happens to be
driving my 1920x1200 display, I might get better numbers if I turned off
compiz, let alone X11.
Kudos to Nvidia for having a linux friendly toolchain that I could find,
download, install, and compile a code with minimal hassle.
More information about the Beowulf
mailing list