[Beowulf] Has anyone actually seen/used a cell system?

Mark Hahn hahn at physics.mcmaster.ca
Sun Oct 1 20:24:13 PDT 2006


>> The same site reports that the X6800, a 2.93 GHz Core2 and sees
>> almost 12.5 SP GFLOPS using ScienceMark 2.0 (6.2 DP GFLOPS).

hmm, those numbers are pretty low - peak should be 2.93*4 or 8,
and I'd expect 80% of peak or 19 Gflops/core for this comparison
(Opterons can do 90%, at least on my machine using HPL.)

so the paper shows 80.6 Gflops SGEMM for 8 SPE's; it's only 
fair to compare this to 2 or 4 Core2 cores (37.5 and 75 Gflops!)

> indicative of per core performance on Core 2. Is it safe to say that
> Core 2 achieves <15 gflops/core at 3ghz, assuming ~15% premium with Goto
> BLAS?

peak SGEMM/core would be 3*8=24, so 15 sounds quite low.

>> It looks like a preproduction 2.4 GHz Cell is 2-6 times faster than a

do you know of something crippled in the pre-production Cell chips?

it looks like 2x is about right to me, considering that full-production
Cell appears to ship about the same time as 4x Core2.  the main question
is whether that's good enough to make Cell more than a niche product.
I've talked with a number of my better users, and they all tend to want
>=10x speedup before considering non-GP approaches (cell, fpga, gpgpu).

> I guess my biggest objection to Mark's comment was the comparison of
> SGEMM implemented in an experimental language with unproven structure
> with a theoretical calculation of Core 2 peak performance. I'd simply

I don't think there's anything too dubious about 80% of theoretical for 
Core2.  but I also didn't think the Sequoia stuff was such a cheap hack
as you imply (not to put words into your mouth ;)

> like to see a benchmark comparison of SGEMM (and DGEMM) using Core
> 2-optimized BLAS vs. Cell-optimized BLAS, thereby making a useful
> conclusion about how interesting Cell is for HPC.

actually, Sequoia seems precisely like the structure you need to make Cell
work, since it's whole purpose is to express the rather constrained way that 
memory is used in Cell.  the paper is actually pretty clear on where the Cell
spends its time, and for SGEMM, it's executing the "leaf" code, which is
IBM's Cell library.

I guess the prototype might be really bad, or Sequoia might be broken in a
way not hinted in the paper, or IBM's Cell intrinsic library could be
terrible.  but the paper seems on the up-and-up, and the scaling curves and
leave-vs-communication figures surely make Cell look underwhelming,
at least if you assume, as I do, that it has to deliver a large speedup
to be worth investing in...

regards, mark hahn.



More information about the Beowulf mailing list