[Beowulf] Re: Opteron 275 performance

Robert G. Brown rgb at phy.duke.edu
Fri Jul 29 15:09:29 PDT 2005


Mark Hahn writes:

>> It seems there is an increasingly dominating opinion that the second 
>> core is meant to run anti-virus protection software 24 hours a day.
> 
> only in the sense that all CPUs are intended to run desktop-windows :|
> 
> seriously, if that were the case, then there would be little point to 
> providing both CPUs with giant 1MB caches.  ("giant" here is relative to 
> the amount of die area devoted to computation, of course!)
> 
> I find that most software has a pretty high flops-per-byte ratio, at least 
> as compared to Stream/daxpy.  dual-core K8's seem like a pretty clear win,
> though memory contention and higher single-core clocks can argue against.
> (I'm about to receive 1536 single-core AMD's...)

Well said, and my opinion precisely as well.  "Most" applications will
be able to achieve nearly linear speedup across two cores; even some
that are fairly memory intensive, now just as back in the two CPU
Pentium Pro SMP systems with a heavily oversubscribed memory bus and
barely smp-capable 2.0.x linux kernels.  Dual CPU clusters have for that
reason tended to be the most common -- the cheapest way to get raw
aggregate FLOPS -- for pretty much always.  However, ALSO as ALWAYS,
YMMV and it is quite possible that stream-like applications with an FPB
ratio way less than one suffer from sharing a memory controller.  "Most"
people with apps in this category know that they are there, though, and
should proceed with appropriate caution, and anybody who isn't sure
should obviously ALSO proceed with caution.

A benchmark run of your own application(s) is worth any number of white
papers, SPEC results, theoretical computations of speed, stream numbers,
"MFLOPS" measurements, and (regretfully) free tee shirts, coffee cups,
and other corporate lucre from vendors.  If you don't KNOW how your
applications scale and utilize memory across lots of architectures, make
such a run.  If you DO know how it scales, do such a run anyway, if you
can.  I was surprised by how much the Opteron OUTperformed my
expectations the first time I ran code on it, so even if you think that
you have a handle on things you can be wrong.  Lots of nonlinearity and
complexity in computer design = very difficult to predict performance on
"new" architectures (or significant variations of old ones).  As Joe
already noted...

   rgb

P.S. -- I like the use of flops-per-byte (FPB) as a meaningful measure
here.  Being shameless, I may steal it for my own use in future
discussions and pretend that I made it up...:-)

P.P.S. -- ... although there is probably a more precisely relevant
measure in clocks (of computation) per byte, especially where clocks
required to retrieve bytes are known in various contexts, since this too
varies with stride, direction through memory, pattern.  All that damn
complexity again...;-)

> 
> regards, mark hahn.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050729/d6091ff1/attachment.sig>


More information about the Beowulf mailing list