Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Barcelona numbers

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Bill Broadley bill at cse.ucdavis.edu
Mon Sep 10 17:34:42 PDT 2007


Vincent Diepeveen wrote:
> that simple C program that measures latency,
> can you try it with a more realistic working set size also
> to measure RAM latency, so with like 2GB in total or so?

I think it measures RAM latency quite well, but doesn't exercise the TLB as
hard as a 2GB dataset would.  8 Thread randomly accessing 2GB is a TLB
nightmare.  I do not believe the kernel I'm using has the 1GB pages
available on the barcelona chips.

In any case, sure I'll run 2GB numbers.

Opteron 2350 (2.0 GHz):
pathcc -O4 -mp stream.c -o stream
Total memory required = 2014.2 MB.
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       15328.3395       0.0921       0.0919       0.0922
Scale:      15297.8845       0.0921       0.0920       0.0922
Add:        14787.7337       0.1432       0.1428       0.1437
Triad:      15067.3052       0.1403       0.1402       0.1404
-------------------------------------------------------------
Solution Validates

gcc -c -O4 -Wall -pedantic plat.c
gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma
Each thread accesses 67108864 INTs in a 256 MB array.
With 1 thread(s), max latency was 9.174 seconds, effective latency=136.70 ns.
With 2 thread(s), max latency was 9.186 seconds, effective latency=68.44 ns.
With 4 thread(s), max latency was 9.763 seconds, effective latency=36.37 ns.
With 8 thread(s), max latency was 10.589 seconds, effective latency=19.72 ns.

Opteron 275 (2.2 GHz):
pathcc -O4 -mp stream.c -o stream
Total memory required = 2014.2 MB.
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        8607.2317       0.0189       0.0186       0.0215
Scale:       8637.8088       0.0186       0.0185       0.0186
Add:         8249.3994       0.0291       0.0291       0.0292
Triad:       8244.0621       0.0301       0.0291       0.0372

gcc -c -O4 -Wall -pedantic plat.c
gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma
Each thread accesses 67108864 INTs in a 256 MB array.
With 1 thread(s), max latency was 7.737 seconds, effective latency=115.29 ns.
With 2 thread(s), max latency was 7.722 seconds, effective latency=57.53 ns.
With 4 thread(s), max latency was 16.174 seconds, effective latency=60.25 ns.

Previously when the opteron DDR-2 systems were newish a fair number of people
posted stream numbers for the opterons and intels of the time.  My vague
memory was that intel was in the 7-9GB/sec and the ddr-2 opterons were in the
12.5-13.0GB/sec range.






More information about the Beowulf mailing list