[Beowulf] Has anyone actually seen/used a cell system?
agshew at gmail.com
Sun Oct 1 17:28:32 PDT 2006
On 10/1/06, Geoff Jacobs <gdjacobs at gmail.com> wrote:
> > interesting that for a 2.4GHz Cell, they get at most 10 FP Gflops per SPE.
> > does anyone have SGEMM numbers for a 3GHz Intel Core2? I'll guess that
> > efficiency of libgoto with 2 threads would be >= 80%, so flops would be
> > .8*2*8*3 =~ 40 Gflops, or half a Cell chip. makes it hard to argue for
> > wide use of Cell, I think...
> Unfortunately, the reality is a little crappier. Sciencemark 2.0 SGEMM
> sees 11 gflops on an E6700. DGEMM sees 5-6 gflops.
The same site reports that the X6800, a 2.93 GHz Core2 and sees
almost 12.5 SP GFLOPS using ScienceMark 2.0 (6.2 DP GFLOPS).
I don't know much about ScienceMark. The website has been
replaced with advertisements. From what I gathered from several
review sites, it is MS Windows only and single threaded. My
guess is that Goto's implementation would perform significantly
better even with a single thread. Unfortunately, I looked all over
and couldn't find Core2 benchmarks using Goto's BLAS.
> The linked article _is_ an evaluation of performance on an actual Cell
> chip. Unfortunately, it's a lower clocked pre-production example running
> an experimental pseudo-compiler. I'm interested in seeing SGEMM using
> Cell-specific intrinsics. Such a benchmark should represent the maximum
> practical performance peak.
> Note: even if the Sequoia numbers are approximately the same as SPE
> intrinsics, cell is still 7x faster than Core2.
The Sequoia implementation used IBM's Cell SDK, according to the paper.
It looks like a preproduction 2.4 GHz Cell is 2-6 times faster than a 2.93 GHz
Core2 at SGEMM. That's an awfully big range, so hopefully someone
wil be kind enough to benchmark libgoto on Core2 for us. The history file
indicates that libgoto is optimized for Core2, but I don't have one to test.
More information about the Beowulf