[Beowulf] AMD64 results...

Robert G. Brown rgb at phy.duke.edu
Thu Dec 16 05:08:21 PST 2004


On Wed, 15 Dec 2004, Bill Broadley wrote:

> Group reply:
> 
> On Wed, Dec 15, 2004 at 05:49:09PM -0500, Robert G. Brown wrote:
> > Just for those of you who were asking after AMD64's as viable compute
> > platforms, I just ran stream and the bogomflops benchmark in my renamed
> > "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+
> 
> That is a s754 amd64?  

Yes (as per earlier discussion, an Asus K8NE, but I should have restated
it -- the P2 is an MSI mobo but I'm downstairs and don't remember which
one).

> 
> > They are all below.  Executive summary is that the AMD barely beats
> > (real) clock speed scaling compared to the P2 for stream.  I suspect
> > that this is not yet the end of the story, though, as I see little
> > difference between the i386 benchmark results and the x86_64 results
> > when running the program compiled both ways on metatron.
> 
> Double registers only help if you need them.  Most codes won't
> automatically utilize native 64 bit ints or pointers to any
> significant advantage.

Well, stream is as much a memory bandwidth test as it is a floating
point test per se anyway.  I always hope for something dramatic when I
use faster/wider memory, but usually reality is fairly sedate.

> > The INTERESTING story is in bogomflops, which includes division.  There
> > metatron was a whopping 2.8x faster than lucifer, while its clock is
> > only 1.33x faster.  It more than doubled its relative clockspeed
> > advantage, so to speak.  One can see how having 64 bits would really
> > speed up 64 bit division compared to doing it in software across
> > multiple 32 bit registers...
> 
> Interesting data point.
> 
> > Hope this is interesting/useful to somebody.  I put "real stream" at the
> > very end.  "real stream" uses the best time where benchmaster uses the
> > average time so benchmaster results are typically a few percent lower
> > (and likely just that much more realistic as well).
> 
> Similar data points for an opteron, dual (stream using 1 cpu) 2.2 GHz,
> with PC3200 memory (915.5MB array).  Not sure why the timer is so lousy,
> I had to make the array large to get a reasonably accurate time:

You should try stream inside my leedle harness that uses the CPU cycle
counter clock.  It autotunes iterations and so forth and generates an SD
as well as mean.

That's the "clock granularity" thing in my test results.  Note that it
is 3 nsec on the AMD 64 and almost 100 nsec on the P2.  This is also an
interesting data point -- it suggests that integer instructions may be
considerably faster on the AMD64.  I'll have to run a mixed code program
to find out, though.

> I suspect the below numbers would be higher if I had a uniprocessor system
> (never have a remote memory access or wait for the memory coherency)
> or with a 2.6 Kernel (which is better about insuring that pages and the
> process acting on the page is on the same cpu).
> 
> Kudos for the pathscale-1.4 compiler with -O3.

Now that's something to try.  I still haven't started my thirty day
three trial that I signed up for two months ago (I should know better
than to do that right before the end of classes).  I've got almost a
good month of reduced teaching ahead -- maybe I'll start it now.  From
everything I've heard and seen, I'm going to end up buying a license or
two anyway -- it seems like it is just a really, really good product
being maintained by some very serious people.

Of course, a factor of two in speed (for certain code) for the cost of a
software license is a hell of a lot cheaper than buying a cluster twice
as large.  That helps.

   rgb

> 
> gcc-3.2.3 -O1:
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        2206.8823       0.3010       0.2900       0.3800
> Scale:       2285.7067       0.2880       0.2800       0.3700
> Add:         2285.7087       0.4140       0.4200       0.5300
> Triad:       2285.7152       0.3910       0.4200       0.4700
> 
> -O2
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        1777.7736       0.3240       0.3600       0.3600
> Scale:       1777.7783       0.3240       0.3600       0.3600
> Add:         1882.3495       0.4590       0.5100       0.5100
> Triad:       1882.3530       0.4590       0.5100       0.5100
> 
> -O3
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        1777.7924       0.3260       0.3600       0.3700
> Scale:       1828.4723       0.3230       0.3500       0.3600
> Add:         1882.3679       0.4640       0.5100       0.5200
> Triad:       1846.1717       0.4720       0.5200       0.5300
> 
> gcc-3.4.3 -O1:
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        1729.6823       0.3330       0.3700       0.3700
> Scale:       1828.5184       0.3230       0.3500       0.3600
> Add:         1846.1048       0.4680       0.5200       0.5200
> Triad:       1846.1040       0.4680       0.5200       0.5200
> 
> -O2:
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        2133.3337       0.2960       0.3000       0.3500
> Scale:       2133.3337       0.2980       0.3000       0.3500
> Add:         2232.5578       0.4270       0.4300       0.5100
> Triad:       2181.8132       0.4310       0.4400       0.5100
> 
> -O3:
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        2285.6561       0.2630       0.2800       0.3600
> Scale:       2285.6581       0.2580       0.2800       0.3100
> Add:         2341.4071       0.3800       0.4100       0.4700
> Triad:       2285.6555       0.3880       0.4200       0.5200
> 
> Pathscale-1.4 -O1:
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        1999.9498       0.2880       0.3200       0.3200
> Scale:       2064.4625       0.2840       0.3100       0.3200
> Add:         2232.5009       0.3950       0.4300       0.4400
> Triad:       2232.4910       0.3930       0.4300       0.4400
> 
> -O2
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        2461.5205       0.2410       0.2600       0.2700
> Scale:       2285.6970       0.2530       0.2800       0.2900
> Add:         2341.4466       0.3730       0.4100       0.4200
> Triad:       2399.9765       0.3670       0.4000       0.4100
> 
> -O3
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        3764.6831       0.1540       0.1700       0.1800
> Scale:       3764.6831       0.1530       0.1700       0.1700
> Add:         4173.8781       0.2080       0.2300       0.2400
> Triad:       4173.8781       0.2110       0.2300       0.2400
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list