[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron275performance]

Vincent Diepeveen diep at xs4all.nl
Mon Jul 4 08:59:40 PDT 2005

Opteron for most workloads scales better than Xeon of course.

A quad xeon has 1 memory controller and a dual core dual opteron has 2.

The opteron has a higher bandwidth and a faster one TLB trashing latency.

Effectively if you do next test:
  take a buffer of memory of say 400MB, and randomly read 8 bytes from the
buffer and then test which machine is going to do it faster, then opteron
eats the xeon alive of course.

Testing metholody: each processor allocates a buffer of n bytes and cross
attaches to the other processors.

Of course we take a large buffer. Around 400MB is the working set size for
the hashtable which i use for my chess software (which is reading randomly
a 8-64 bytes from the cache).

   single cpu A64 : 91 ns  (cl2 memory)
   single cpu P4  : 220 ns (cl2 memory, bus overclocked)
   dual opteron   : 120 ns 
   quad opteron   : 133 ns
   dual xeon      : 280 ns (800Mhz bus)
   dual xeon      : 400 ns (533Mhz bus)

So obviously things that do not fit in L2 cache, the opteron runs away with
it. Only if the executable is optimized in question by the intel c++
compiler it will have done stuff to run it faster at intel processors than
at opteron, 
then results do not look too bad for P4. Yet that's a matter of optimizing
it for opteron better, which most software dudes do NOT do, as intel
delivers good support and AMD historically didn't deliver *any* kind of
support (they are improving now, but even then their math libraries are so
pathetic compared to the ease of the intel libraries that i can imagine at
least *that* part of
the problems).

Objectively there is however no question about it that hardware delivering
TLB trashing random lookups of 8-64 bytes from a big cache, that the
Opteron is over 2 times faster there than Xeon.

Because of 3 reasons:
  a) it has an on die memory controller, 
  b) so it has MORE memory controllers
  b) on die memory controller is clocked higher than the memory
     chipset from intel


At 10:30 AM 7/1/2005 +0100, Kozin, I \(Igor\) wrote:
>It's great to see someone is brave enough to publish 
>a Gaussian benchmark.
>On the other hand the results are predictable:
>since the Xeon scales linearly from one to two
>you'd expect the Opteron to scale well too, wouldn't you?
>So the factor 1.95 comes from a comparison of four
>and two "cores" on a test which apparently performs
>well out of cache.
>> To add to the discussion about the performance of new dual-core
>> processors for computational chemistry applications,
>> the comparison of Intel and AMD dual-CPU based computers is shown at:
>> http://www.sg-chem.net/cluster/
>> As can be seen from the graph, the Gaussian 03 execution 
>> speed (test job
>> 397) on dual-core dual-CPU Opteron 275 workstation is faster 
>> by a factor of 1.95
>> as compared to the dual-CPU Xeon 3.2GHz 800MHz FSB machine.
>> -----------------
>> I would like to thank Ed Gasiorowski (AMD) and Mike Fay (Colfax
>> International) for their support.
>> Serge Gorelsky
>> ----------------------------------------------------------------
>>  Dr S.I. Gorelsky, Department of Chemistry, Stanford University
>>  Box 155, 333 Campus Drive, Stanford, CA 94305-5080 USA
>>  Phone: (650) 723-0041. Fax: (650) 723-0852.
>> ----------------------------------------------------------------
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list