[Beowulf] AMD64 results...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bill Broadley bill at cse.ucdavis.eduWed Dec 15 18:29:56 PST 2004
- Previous message: [Beowulf] AMD64 results...
- Next message: [Beowulf] AMD64 results...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Group reply: On Wed, Dec 15, 2004 at 05:49:09PM -0500, Robert G. Brown wrote: > Just for those of you who were asking after AMD64's as viable compute > platforms, I just ran stream and the bogomflops benchmark in my renamed > "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+ That is a s754 amd64? > They are all below. Executive summary is that the AMD barely beats > (real) clock speed scaling compared to the P2 for stream. I suspect > that this is not yet the end of the story, though, as I see little > difference between the i386 benchmark results and the x86_64 results > when running the program compiled both ways on metatron. Double registers only help if you need them. Most codes won't automatically utilize native 64 bit ints or pointers to any significant advantage. > The INTERESTING story is in bogomflops, which includes division. There > metatron was a whopping 2.8x faster than lucifer, while its clock is > only 1.33x faster. It more than doubled its relative clockspeed > advantage, so to speak. One can see how having 64 bits would really > speed up 64 bit division compared to doing it in software across > multiple 32 bit registers... Interesting data point. > Hope this is interesting/useful to somebody. I put "real stream" at the > very end. "real stream" uses the best time where benchmaster uses the > average time so benchmaster results are typically a few percent lower > (and likely just that much more realistic as well). Similar data points for an opteron, dual (stream using 1 cpu) 2.2 GHz, with PC3200 memory (915.5MB array). Not sure why the timer is so lousy, I had to make the array large to get a reasonably accurate time: I suspect the below numbers would be higher if I had a uniprocessor system (never have a remote memory access or wait for the memory coherency) or with a 2.6 Kernel (which is better about insuring that pages and the process acting on the page is on the same cpu). Kudos for the pathscale-1.4 compiler with -O3. gcc-3.2.3 -O1: Function Rate (MB/s) Avg time Min time Max time Copy: 2206.8823 0.3010 0.2900 0.3800 Scale: 2285.7067 0.2880 0.2800 0.3700 Add: 2285.7087 0.4140 0.4200 0.5300 Triad: 2285.7152 0.3910 0.4200 0.4700 -O2 Function Rate (MB/s) Avg time Min time Max time Copy: 1777.7736 0.3240 0.3600 0.3600 Scale: 1777.7783 0.3240 0.3600 0.3600 Add: 1882.3495 0.4590 0.5100 0.5100 Triad: 1882.3530 0.4590 0.5100 0.5100 -O3 Function Rate (MB/s) Avg time Min time Max time Copy: 1777.7924 0.3260 0.3600 0.3700 Scale: 1828.4723 0.3230 0.3500 0.3600 Add: 1882.3679 0.4640 0.5100 0.5200 Triad: 1846.1717 0.4720 0.5200 0.5300 gcc-3.4.3 -O1: Function Rate (MB/s) Avg time Min time Max time Copy: 1729.6823 0.3330 0.3700 0.3700 Scale: 1828.5184 0.3230 0.3500 0.3600 Add: 1846.1048 0.4680 0.5200 0.5200 Triad: 1846.1040 0.4680 0.5200 0.5200 -O2: Function Rate (MB/s) Avg time Min time Max time Copy: 2133.3337 0.2960 0.3000 0.3500 Scale: 2133.3337 0.2980 0.3000 0.3500 Add: 2232.5578 0.4270 0.4300 0.5100 Triad: 2181.8132 0.4310 0.4400 0.5100 -O3: Function Rate (MB/s) Avg time Min time Max time Copy: 2285.6561 0.2630 0.2800 0.3600 Scale: 2285.6581 0.2580 0.2800 0.3100 Add: 2341.4071 0.3800 0.4100 0.4700 Triad: 2285.6555 0.3880 0.4200 0.5200 Pathscale-1.4 -O1: Function Rate (MB/s) Avg time Min time Max time Copy: 1999.9498 0.2880 0.3200 0.3200 Scale: 2064.4625 0.2840 0.3100 0.3200 Add: 2232.5009 0.3950 0.4300 0.4400 Triad: 2232.4910 0.3930 0.4300 0.4400 -O2 Function Rate (MB/s) Avg time Min time Max time Copy: 2461.5205 0.2410 0.2600 0.2700 Scale: 2285.6970 0.2530 0.2800 0.2900 Add: 2341.4466 0.3730 0.4100 0.4200 Triad: 2399.9765 0.3670 0.4000 0.4100 -O3 Function Rate (MB/s) Avg time Min time Max time Copy: 3764.6831 0.1540 0.1700 0.1800 Scale: 3764.6831 0.1530 0.1700 0.1700 Add: 4173.8781 0.2080 0.2300 0.2400 Triad: 4173.8781 0.2110 0.2300 0.2400 -- Bill Broadley Computational Science and Engineering UC Davis
- Previous message: [Beowulf] AMD64 results...
- Next message: [Beowulf] AMD64 results...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
