[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron 275performance]

Alan Louis Scheinine scheinin at crs4.it
Wed Jul 6 07:12:25 PDT 2005

I wrote:
 > > A quad-CPU board with single-core Opteron was
 > > nearly twice as fast as a dual-CPU board with dual-core Opteron,
Mikhail Kuzminsky wrote:
  > But this result means, that 4 cores of Opteron are "equal by performance"
  > to 2 "single core"  Opterons. If it'll be *exactly*,
  > your program looks as working "only" w/RAM (I suppose that
  > memory throughput don't scale from single core Opteron to 2-cores chip,
  > what is, generally speaking, incorrect), and there is
  > practically no "memory-independed" computations !

I did some other benchmarking tests, a two-chip board with dual-core,
that is, 4 cores on the board, was in other cases 20 percent and 40 percent
slower than two nodes of a cluster, each node with two single-core chips.
Really, the first program is very dependent on main memory.
It is a bit of an exaggeration to say that such a program has "practically
no 'memory-independent' computations".  Since both level 1 and level 2 cache
are necessary on the Opteron, it seems evident that bandwidth to main memory
is much less than the computational potential.  There might be reuse of
variables and some memory-independent computations in the program, but still
the bandwidth to main memory is relatively narrow compared to the potential of
the arithmetic units.

My main point is, as I wrote, "your milage may vary."  I've heard from various
people that "everybody is going to dual-core".  I simply want to emphasize that
the dual-core choice is not for everybody.  In particular, I looked at profiling
done by the compiler from PGI, pgf90, it managed to vectorized some rather
complicated arithmetic expressions.  This suggests to me that more programs
than in the past will efficiently use very long vectors for which the memory
bandwidth is important.

On this same theme, the programs that are impacted by bandwidth to main memory
seem to hit a limit for single-core CPUs of about 2.0 GHz.  Aside from the
question of dual-core, what has been the experience of other people with
regard to very fast single-core CPUs?  For programs that have vectors longer
than the size of L2 cache, is there a speed grade above which no gain is seen?


  Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
  Center for Advanced Studies, Research, and Development in Sardinia

  Postal Address:               |  Physical Address for FedEx, UPS, DHL:
  ---------------               |  -------------------------------------
  Alan Scheinine                |  Alan Scheinine
  c/o CRS4                      |  c/o CRS4
  C.P. n. 25                    |  Loc. Pixina Manna Edificio 1
  09010 Pula (Cagliari), Italy  |  09010 Pula (Cagliari), Italy

  Email: scheinin at crs4.it

  Phone: 070 9250 238  [+39 070 9250 238]
  Fax:   070 9250 216 or 220  [+39 070 9250 216 or +39 070 9250 220]
  Operator at reception: 070 9250 1  [+39 070 9250 1]
  Mobile phone: 347 7990472  [+39 347 7990472]

More information about the Beowulf mailing list