[Beowulf] Woodcrest Memory bandwidth
Joe Landman
landman at scalableinformatics.com
Mon Aug 14 13:02:40 PDT 2006
Mark Hahn wrote:
> kinda sucks, doesn't it? here's what I get for a not-new dual-275 with
> 8x1G PC3200 (I think):
>
> Function Rate (MB/s) RMS time Min time Max time
> Copy: 5714.6837 0.0840 0.0840 0.0841
> Scale: 5821.0766 0.0825 0.0825 0.0826
> Add: 6437.8226 0.1119 0.1118 0.1120
> Triad: 6414.2079 0.1123 0.1123 0.1124
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
Now that I am back from "da Yoo Pee" I can post some of my numbers.
Here is Our dual core opteron 275.
4-threads
Function Rate (MB/s) Avg time Min time Max time
Copy: 9999.3465 0.0356 0.0320 0.0360
Scale: 8888.4147 0.0360 0.0360 0.0360
Add: 9230.2533 0.0542 0.0520 0.0560
Triad: 9230.2321 0.0538 0.0520 0.0560
1-thread
Function Rate (MB/s) Avg time Min time Max time
Copy: 4705.6130 0.0711 0.0680 0.0720
Scale: 4705.6130 0.0702 0.0680 0.0720
Add: 4615.1161 0.1067 0.1040 0.1080
Triad: 4444.1975 0.1080 0.1080 0.1080
using a PathScale compiled binary. I see slightly higher numbers using
PGI 6.1-2 compiled binaries for single threads, not sure why. The
6.1-5/6 compiled are worse :(
Function Rate (MB/s) RMS time Min time Max time
Copy: 6666.9379 0.0454 0.0300 0.0500
Scale: 4000.0610 0.0567 0.0500 0.0600
Add: 4285.7330 0.0758 0.0700 0.0800
Triad: 4285.7330 0.0747 0.0700 0.0900
Same code binary on woodcrest 2.66 GHz
Function Rate (MB/s) RMS time Min time Max time
Copy: 5000.1240 0.0427 0.0400 0.0500
Scale: 5000.1240 0.0452 0.0400 0.0500
Add: 5000.0445 0.0685 0.0600 0.0800
Triad: 5000.0445 0.0712 0.0600 0.0800
Intel 9.1 compiled version (64 bit)
1-thread
Function Rate (MB/s) RMS time Min time Max time
Copy: 4447.6829 0.1440 0.1439 0.1445
Scale: 4613.8072 0.1388 0.1387 0.1390
Add: 4256.9431 0.2256 0.2255 0.2259
Triad: 4187.6605 0.2294 0.2292 0.2302
2-threads
Function Rate (MB/s) RMS time Min time Max time
Copy: 7288.3813 0.0882 0.0878 0.0893
Scale: 7186.2381 0.0891 0.0891 0.0893
Add: 7085.0852 0.1357 0.1355 0.1365
Triad: 6916.0273 0.1389 0.1388 0.1392
3-threads
Function Rate (MB/s) RMS time Min time Max time
Copy: 6589.2489 0.0989 0.0971 0.1001
Scale: 6528.4171 0.0988 0.0980 0.0997
Add: 6535.0076 0.1488 0.1469 0.1504
Triad: 6563.9202 0.1486 0.1463 0.1496
4-threads
Function Rate (MB/s) RMS time Min time Max time
Copy: 6645.4125 0.0965 0.0963 0.0976
Scale: 6994.6233 0.0916 0.0915 0.0917
Add: 6373.0207 0.1508 0.1506 0.1509
Triad: 6710.7522 0.1432 0.1431 0.1433
I may have been Bill's 10 GB/s source, and that may have been a mixup on
my part.
FWIW: the PathScale compiled binaries on this machine give
Function Rate (MB/s) Avg time Min time Max time
Copy: 7272.4071 0.0453 0.0440 0.0480
Scale: 7272.2298 0.0462 0.0440 0.0480
Add: 5999.6258 0.0827 0.0800 0.0840
Triad: 5999.6302 0.0831 0.0800 0.0840
and the PGI compiled ones give
Function Rate (MB/s) RMS time Min time Max time
Copy: 6608.0161 0.0970 0.0969 0.0977
Scale: 4592.3298 0.1395 0.1394 0.1397
Add: 4259.8885 0.2262 0.2254 0.2269
Triad: 4244.0478 0.2269 0.2262 0.2273
They may be slightly different versions of the original source (notice
the labels on the columns), but the core measurements are the same.
On the Opteron 275, we have two memory nodes, each with multiple banks
per node.
landman at dualcore:~/stream> numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3
cpubind:
nodebind:
membind: 0 1
landman at dualcore:~/stream> numactl --hardware
available: 2 nodes (0-1)
node 0 size: 2015 MB
node 0 free: 1276 MB
node 1 size: 4025 MB
node 1 free: 2416 MB
node distances:
node 0 1
0: 10 20
1: 20 10
On the woodcrest, it looks like a single memory node.
landman at woody:~> numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3
cpubind:
nodebind:
membind: 0
landman at woody:~> numactl --hardware
available: 1 nodes (0-0)
node 0 size: 4017 MB
node 0 free: 2649 MB
node distances:
node 0
0: 10
I have it on good authority that with the other chipset (we have a
Blackford here), we should see higher numbers. Not exceeding the
Opteron 275 though.
When I have time, I will investigate this more and write about it on my
blog. FWIW, I am not seeing a clear performance picture emerging. I
have heard speculation/rumor from others, but I prefer measurement, and
my measurements while consistent, are not exposing a nice and meaningful
picture where I can say "yes its faster" or "no it isn't".
What I can say is that Woodcrest is interesting. It just may be
overhyped by a "compliant" media.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list