[Beowulf] Barcelona numbers
richard.walsh at comcast.net
richard.walsh at comcast.net
Mon Sep 10 20:32:29 PDT 2007
Bill Broadley <bill at cse.ucdavis.edu> wrote:
> Dual socket quad core opteron 2350's (2.0 GHz) running the current McCalpin'S
> STREAM compiled with pathscale-3.0 -mp -O4:
> Total memory required = 228.9 MB.
> Function Rate (MB/s) Avg time Min time Max time
> Copy: 15355.3139 0.0104 0.0104 0.0105
> Scale: 15249.5885 0.0105 0.0105 0.0105
> Add: 14954.2883 0.0161 0.0160 0.0162
> Triad: 15061.2389 0.0160 0.0159 0.0160
So with all 8 cores at work from 2 sockets you are seeing 70% of peak assuming
you are using 667 MHz DDR2 (as fast as you can get until the "Phenom" comes
out I think) which is a little better on a percentage basis than socket 940 numbers.
That meets expectations. I am surprised by the latency number you provide though.
Latencies in the 90 to 100+ nanos are quite a bit higher than I expected and are edging
up into the Intel range. Perhaps this is an L3 cache delay effect -- a new layer in
the path to memory in the Barcelona. Although I see your 200 series numbers are
up there too ... I thought first byte latencies were around 65 nanos for Opteron. Am
I confused?
Anyway, if the latency numbers hold up, I would say this is not the greatest news for
Barcelona. We can anticipate faster clocks which should help, but it makes you wonder what
things would have looked like with a larger shared L2 cache instead of an L3. This is
a synthetic test of course, what compilers and users do to strip mine for cache will
present a more realistic assessment. Perhaps this was the trade off driving this
design. Can I continue to think of the AMD as the first byte latency king? ... ;-) ...
rbw
--
"Making predictions is hard, especially about the future."
Niels Bohr
--
Richard Walsh
Thrashing River Consulting--
5605 Alameda St.
Shoreview, MN 55126
Phone #: 612-382-4620
More information about the Beowulf
mailing list