[Beowulf] Barcelona numbers
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bill Broadley bill at cse.ucdavis.eduMon Sep 10 17:34:42 PDT 2007
- Previous message: [Beowulf] Barcelona numbers
- Next message: [Beowulf] AMD Barcelona Launch
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Vincent Diepeveen wrote: > that simple C program that measures latency, > can you try it with a more realistic working set size also > to measure RAM latency, so with like 2GB in total or so? I think it measures RAM latency quite well, but doesn't exercise the TLB as hard as a 2GB dataset would. 8 Thread randomly accessing 2GB is a TLB nightmare. I do not believe the kernel I'm using has the 1GB pages available on the barcelona chips. In any case, sure I'll run 2GB numbers. Opteron 2350 (2.0 GHz): pathcc -O4 -mp stream.c -o stream Total memory required = 2014.2 MB. Function Rate (MB/s) Avg time Min time Max time Copy: 15328.3395 0.0921 0.0919 0.0922 Scale: 15297.8845 0.0921 0.0920 0.0922 Add: 14787.7337 0.1432 0.1428 0.1437 Triad: 15067.3052 0.1403 0.1402 0.1404 ------------------------------------------------------------- Solution Validates gcc -c -O4 -Wall -pedantic plat.c gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma Each thread accesses 67108864 INTs in a 256 MB array. With 1 thread(s), max latency was 9.174 seconds, effective latency=136.70 ns. With 2 thread(s), max latency was 9.186 seconds, effective latency=68.44 ns. With 4 thread(s), max latency was 9.763 seconds, effective latency=36.37 ns. With 8 thread(s), max latency was 10.589 seconds, effective latency=19.72 ns. Opteron 275 (2.2 GHz): pathcc -O4 -mp stream.c -o stream Total memory required = 2014.2 MB. Function Rate (MB/s) Avg time Min time Max time Copy: 8607.2317 0.0189 0.0186 0.0215 Scale: 8637.8088 0.0186 0.0185 0.0186 Add: 8249.3994 0.0291 0.0291 0.0292 Triad: 8244.0621 0.0301 0.0291 0.0372 gcc -c -O4 -Wall -pedantic plat.c gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma Each thread accesses 67108864 INTs in a 256 MB array. With 1 thread(s), max latency was 7.737 seconds, effective latency=115.29 ns. With 2 thread(s), max latency was 7.722 seconds, effective latency=57.53 ns. With 4 thread(s), max latency was 16.174 seconds, effective latency=60.25 ns. Previously when the opteron DDR-2 systems were newish a fair number of people posted stream numbers for the opterons and intels of the time. My vague memory was that intel was in the 7-9GB/sec and the ddr-2 opterons were in the 12.5-13.0GB/sec range.
- Previous message: [Beowulf] Barcelona numbers
- Next message: [Beowulf] AMD Barcelona Launch
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
