[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron 275performance]

Mikhail Kuzminsky kus at free.net
Wed Jul 6 11:02:31 PDT 2005


In message from Alan Louis Scheinine <scheinin at crs4.it> (Wed, 06 Jul 
2005 16:12:25 +0200):
>
>I wrote:
> > > A quad-CPU board with single-core Opteron was
> > > nearly twice as fast as a dual-CPU board with dual-core Opteron,
>Mikhail Kuzminsky wrote:
>  > But this result means, that 4 cores of Opteron are "equal by 
>performance"
>  > to 2 "single core"  Opterons. If it'll be *exactly*,
>  > your program looks as working "only" w/RAM (I suppose that
>  > memory throughput don't scale from single core Opteron to 2-cores 
>chip,
>  > what is, generally speaking, incorrect), and there is
>  > practically no "memory-independed" computations !
>
>I did some other benchmarking tests, a two-chip board with dual-core,
>that is, 4 cores on the board, was in other cases 20 percent and 40 
>percent
>slower than two nodes of a cluster, each node with two single-core 
>chips.
>Really, the first program is very dependent on main memory.
>It is a bit of an exaggeration to say that such a program has 
>"practically
>no 'memory-independent' computations".  Since both level 1 and level 
>2 cache
>are necessary on the Opteron, it seems evident that bandwidth to main 
>memory
>is much less than the computational potential.  There might be reuse 
>of
>variables and some memory-independent computations in the program, 
>but still
>the bandwidth to main memory is relatively narrow compared to the 
>potential of
>the arithmetic units.
>
>My main point is, as I wrote, "your milage may vary."  I've heard 
>from various
>people that "everybody is going to dual-core".  I simply want to 
>emphasize that
>the dual-core choice is not for everybody.  
Ehh, it'll be for everybody simple because there will be *no* single
core server microprocessors :-)
But I absolutely agree w/you about memory bandwith-limited 
aplications.
Today we have choice.

Yours
Mikhail

>In particular, I looked 
>at profiling
>done by the compiler from PGI, pgf90, it managed to vectorized some 
>rather
>complicated arithmetic expressions.  This suggests to me that more 
>programs
>than in the past will efficiently use very long vectors for which the 
>memory
>bandwidth is important.
>
>On this same theme, the programs that are impacted by bandwidth to 
>main memory
>seem to hit a limit for single-core CPUs of about 2.0 GHz.  Aside 
>from the
>question of dual-core, what has been the experience of other people 
>with
>regard to very fast single-core CPUs?  For programs that have vectors 
>longer
>than the size of L2 cache, is there a speed grade above which no gain 
>is seen?
>
>Alan
>-- 
>
>  Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
>  Center for Advanced Studies, Research, and Development in Sardinia
>
>  Postal Address:               |  Physical Address for FedEx, UPS, 
>DHL:
>  ---------------               | 
> -------------------------------------
>  Alan Scheinine                |  Alan Scheinine
>  c/o CRS4                      |  c/o CRS4
>  C.P. n. 25                    |  Loc. Pixina Manna Edificio 1
>  09010 Pula (Cagliari), Italy  |  09010 Pula (Cagliari), Italy
>
>  Email: scheinin at crs4.it
>
>  Phone: 070 9250 238  [+39 070 9250 238]
>  Fax:   070 9250 216 or 220  [+39 070 9250 216 or +39 070 9250 220]
>  Operator at reception: 070 9250 1  [+39 070 9250 1]
>  Mobile phone: 347 7990472  [+39 347 7990472]
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list