[Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Alan Louis Scheinine scheinin at crs4.itWed Jul 6 07:12:25 PDT 2005
- Previous message: [Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
- Next message: [Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I wrote: > > A quad-CPU board with single-core Opteron was > > nearly twice as fast as a dual-CPU board with dual-core Opteron, Mikhail Kuzminsky wrote: > But this result means, that 4 cores of Opteron are "equal by performance" > to 2 "single core" Opterons. If it'll be *exactly*, > your program looks as working "only" w/RAM (I suppose that > memory throughput don't scale from single core Opteron to 2-cores chip, > what is, generally speaking, incorrect), and there is > practically no "memory-independed" computations ! I did some other benchmarking tests, a two-chip board with dual-core, that is, 4 cores on the board, was in other cases 20 percent and 40 percent slower than two nodes of a cluster, each node with two single-core chips. Really, the first program is very dependent on main memory. It is a bit of an exaggeration to say that such a program has "practically no 'memory-independent' computations". Since both level 1 and level 2 cache are necessary on the Opteron, it seems evident that bandwidth to main memory is much less than the computational potential. There might be reuse of variables and some memory-independent computations in the program, but still the bandwidth to main memory is relatively narrow compared to the potential of the arithmetic units. My main point is, as I wrote, "your milage may vary." I've heard from various people that "everybody is going to dual-core". I simply want to emphasize that the dual-core choice is not for everybody. In particular, I looked at profiling done by the compiler from PGI, pgf90, it managed to vectorized some rather complicated arithmetic expressions. This suggests to me that more programs than in the past will efficiently use very long vectors for which the memory bandwidth is important. On this same theme, the programs that are impacted by bandwidth to main memory seem to hit a limit for single-core CPUs of about 2.0 GHz. Aside from the question of dual-core, what has been the experience of other people with regard to very fast single-core CPUs? For programs that have vectors longer than the size of L2 cache, is there a speed grade above which no gain is seen? Alan -- Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna Center for Advanced Studies, Research, and Development in Sardinia Postal Address: | Physical Address for FedEx, UPS, DHL: --------------- | ------------------------------------- Alan Scheinine | Alan Scheinine c/o CRS4 | c/o CRS4 C.P. n. 25 | Loc. Pixina Manna Edificio 1 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy Email: scheinin at crs4.it Phone: 070 9250 238 [+39 070 9250 238] Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220] Operator at reception: 070 9250 1 [+39 070 9250 1] Mobile phone: 347 7990472 [+39 347 7990472]
- Previous message: [Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
- Next message: [Beowulf] [gorelsky@stanford.edu: CCL:dual-core Opteron 275performance]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
