[Beowulf] Has anyone actually seen/used a cell system?

Geoff Jacobs gdjacobs at gmail.com
Sun Oct 1 14:29:10 PDT 2006

Mark Hahn wrote:
>> They have a paper that explains it well and has some
>> interesting benchmarks.
>> http://sc06.supercomputing.org/schedule/pdf/pap225.pdf
> this is quite interesting.  I wish they had done benchmarks with doubles,
> especially since they alluded to, for instance, the n-body calculation
> really needing at least careful consideration of precision/resolution.
> (now that I think of it, using 23 bits of mantisas on a 256^3 FFT sounds
> numerically dubious too.)
> interesting that for a 2.4GHz Cell, they get at most 10 FP Gflops per SPE.
> does anyone have SGEMM numbers for a 3GHz Intel Core2?  I'll guess that
> efficiency of libgoto with 2 threads would be >= 80%, so flops would be
> .8*2*8*3 =~ 40 Gflops, or half a Cell chip. makes it hard to argue for
> wide use of Cell, I think...

Unfortunately, the reality is a little crappier. Sciencemark 2.0 SGEMM
sees 11 gflops on an E6700. DGEMM sees 5-6 gflops.

This is an order of magnitude less performance than SGEMM predictions in
the LBL paper. Unfortunately, the LBL numbers are only predictions.

The linked article _is_ an evaluation of performance on an actual Cell
chip. Unfortunately, it's a lower clocked pre-production example running
an experimental pseudo-compiler. I'm interested in seeing SGEMM using
Cell-specific intrinsics. Such a benchmark should represent the maximum
practical performance peak.

Note: even if the Sequoia numbers are approximately the same as SPE
intrinsics, cell is still 7x faster than Core2.

Geoffrey D. Jacobs

Go to the Chinese Restaurant,
Order the Special

More information about the Beowulf mailing list