[Beowulf] Has anyone actually seen/used a cell system?

Geoff Jacobs gdjacobs at gmail.com
Sun Oct 1 14:29:10 PDT 2006


Mark Hahn wrote:
>> They have a paper that explains it well and has some
>> interesting benchmarks.
>>
>> http://sc06.supercomputing.org/schedule/pdf/pap225.pdf
> 
> this is quite interesting.  I wish they had done benchmarks with doubles,
> especially since they alluded to, for instance, the n-body calculation
> really needing at least careful consideration of precision/resolution.
> (now that I think of it, using 23 bits of mantisas on a 256^3 FFT sounds
> numerically dubious too.)
> 
> interesting that for a 2.4GHz Cell, they get at most 10 FP Gflops per SPE.
> does anyone have SGEMM numbers for a 3GHz Intel Core2?  I'll guess that
> efficiency of libgoto with 2 threads would be >= 80%, so flops would be
> .8*2*8*3 =~ 40 Gflops, or half a Cell chip. makes it hard to argue for
> wide use of Cell, I think...

Unfortunately, the reality is a little crappier. Sciencemark 2.0 SGEMM
sees 11 gflops on an E6700. DGEMM sees 5-6 gflops.
http://www.pcper.com/article.php?aid=265&type=expert&pid=3

This is an order of magnitude less performance than SGEMM predictions in
the LBL paper. Unfortunately, the LBL numbers are only predictions.
http://www.lbl.gov/Science-Articles/Archive/sabl/2006/Jul/CellProcessorPotential.pdf#search=%22sgemm%20cell%22

The linked article _is_ an evaluation of performance on an actual Cell
chip. Unfortunately, it's a lower clocked pre-production example running
an experimental pseudo-compiler. I'm interested in seeing SGEMM using
Cell-specific intrinsics. Such a benchmark should represent the maximum
practical performance peak.

Note: even if the Sequoia numbers are approximately the same as SPE
intrinsics, cell is still 7x faster than Core2.

-- 
Geoffrey D. Jacobs

Go to the Chinese Restaurant,
Order the Special



More information about the Beowulf mailing list