[Beowulf] [gorelsky at stanford.edu: CCL:dual-core Opteron 275performance]
Mikhail Kuzminsky
kus at free.net
Wed Jul 6 11:02:31 PDT 2005
In message from Alan Louis Scheinine <scheinin at crs4.it> (Wed, 06 Jul
2005 16:12:25 +0200):
>
>I wrote:
> > > A quad-CPU board with single-core Opteron was
> > > nearly twice as fast as a dual-CPU board with dual-core Opteron,
>Mikhail Kuzminsky wrote:
> > But this result means, that 4 cores of Opteron are "equal by
>performance"
> > to 2 "single core" Opterons. If it'll be *exactly*,
> > your program looks as working "only" w/RAM (I suppose that
> > memory throughput don't scale from single core Opteron to 2-cores
>chip,
> > what is, generally speaking, incorrect), and there is
> > practically no "memory-independed" computations !
>
>I did some other benchmarking tests, a two-chip board with dual-core,
>that is, 4 cores on the board, was in other cases 20 percent and 40
>percent
>slower than two nodes of a cluster, each node with two single-core
>chips.
>Really, the first program is very dependent on main memory.
>It is a bit of an exaggeration to say that such a program has
>"practically
>no 'memory-independent' computations". Since both level 1 and level
>2 cache
>are necessary on the Opteron, it seems evident that bandwidth to main
>memory
>is much less than the computational potential. There might be reuse
>of
>variables and some memory-independent computations in the program,
>but still
>the bandwidth to main memory is relatively narrow compared to the
>potential of
>the arithmetic units.
>
>My main point is, as I wrote, "your milage may vary." I've heard
>from various
>people that "everybody is going to dual-core". I simply want to
>emphasize that
>the dual-core choice is not for everybody.
Ehh, it'll be for everybody simple because there will be *no* single
core server microprocessors :-)
But I absolutely agree w/you about memory bandwith-limited
aplications.
Today we have choice.
Yours
Mikhail
>In particular, I looked
>at profiling
>done by the compiler from PGI, pgf90, it managed to vectorized some
>rather
>complicated arithmetic expressions. This suggests to me that more
>programs
>than in the past will efficiently use very long vectors for which the
>memory
>bandwidth is important.
>
>On this same theme, the programs that are impacted by bandwidth to
>main memory
>seem to hit a limit for single-core CPUs of about 2.0 GHz. Aside
>from the
>question of dual-core, what has been the experience of other people
>with
>regard to very fast single-core CPUs? For programs that have vectors
>longer
>than the size of L2 cache, is there a speed grade above which no gain
>is seen?
>
>Alan
>--
>
> Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
> Center for Advanced Studies, Research, and Development in Sardinia
>
> Postal Address: | Physical Address for FedEx, UPS,
>DHL:
> --------------- |
> -------------------------------------
> Alan Scheinine | Alan Scheinine
> c/o CRS4 | c/o CRS4
> C.P. n. 25 | Loc. Pixina Manna Edificio 1
> 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy
>
> Email: scheinin at crs4.it
>
> Phone: 070 9250 238 [+39 070 9250 238]
> Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220]
> Operator at reception: 070 9250 1 [+39 070 9250 1]
> Mobile phone: 347 7990472 [+39 347 7990472]
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list