[Beowulf] Barcelona numbers
Bill Broadley
bill at cse.ucdavis.edu
Mon Sep 10 17:34:42 PDT 2007
Vincent Diepeveen wrote:
> that simple C program that measures latency,
> can you try it with a more realistic working set size also
> to measure RAM latency, so with like 2GB in total or so?
I think it measures RAM latency quite well, but doesn't exercise the TLB as
hard as a 2GB dataset would. 8 Thread randomly accessing 2GB is a TLB
nightmare. I do not believe the kernel I'm using has the 1GB pages
available on the barcelona chips.
In any case, sure I'll run 2GB numbers.
Opteron 2350 (2.0 GHz):
pathcc -O4 -mp stream.c -o stream
Total memory required = 2014.2 MB.
Function Rate (MB/s) Avg time Min time Max time
Copy: 15328.3395 0.0921 0.0919 0.0922
Scale: 15297.8845 0.0921 0.0920 0.0922
Add: 14787.7337 0.1432 0.1428 0.1437
Triad: 15067.3052 0.1403 0.1402 0.1404
-------------------------------------------------------------
Solution Validates
gcc -c -O4 -Wall -pedantic plat.c
gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma
Each thread accesses 67108864 INTs in a 256 MB array.
With 1 thread(s), max latency was 9.174 seconds, effective latency=136.70 ns.
With 2 thread(s), max latency was 9.186 seconds, effective latency=68.44 ns.
With 4 thread(s), max latency was 9.763 seconds, effective latency=36.37 ns.
With 8 thread(s), max latency was 10.589 seconds, effective latency=19.72 ns.
Opteron 275 (2.2 GHz):
pathcc -O4 -mp stream.c -o stream
Total memory required = 2014.2 MB.
Function Rate (MB/s) Avg time Min time Max time
Copy: 8607.2317 0.0189 0.0186 0.0215
Scale: 8637.8088 0.0186 0.0185 0.0186
Add: 8249.3994 0.0291 0.0291 0.0292
Triad: 8244.0621 0.0301 0.0291 0.0372
gcc -c -O4 -Wall -pedantic plat.c
gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma
Each thread accesses 67108864 INTs in a 256 MB array.
With 1 thread(s), max latency was 7.737 seconds, effective latency=115.29 ns.
With 2 thread(s), max latency was 7.722 seconds, effective latency=57.53 ns.
With 4 thread(s), max latency was 16.174 seconds, effective latency=60.25 ns.
Previously when the opteron DDR-2 systems were newish a fair number of people
posted stream numbers for the opterons and intels of the time. My vague
memory was that intel was in the 7-9GB/sec and the ddr-2 opterons were in the
12.5-13.0GB/sec range.
More information about the Beowulf
mailing list