[Beowulf] Has anyone actually seen/used a cell system?

Vincent Diepeveen diep at xs4all.nl
Thu Sep 21 06:36:56 PDT 2006

Actually both the documents you post don't discuss anything except that they
trumpet bigtime about CELL. So much in fact that it is totally DISGUSTING.

Further the compares are to totally outdated IA64/Opteron processors.


Nowadays they're not only higher clocked but also dual core. I'm missing of 
course intels
flagship Core2 in the compare. If you compare a processor that hasn't been 
released yet,
you ought to compare it with hardware that intel and amd release at the 
moment that CELL releases.

Now i realize that the projections were 2005 for CELL and that this document 
probably was
written in 2003.

It is not very nice assuming that IBM can release a 65 nm products and write 
a big article on it,
comparing with a 2 generations older, namely 130 nm product from AMD.

Intels x86-64 chips not even used in the comparision, partly this is of 
course because intel was rather vague about its core2 plans.

However it is obvious throughout history that intel always was leading in 
the process technology for its x86 chips. Usually 6 months ahead of AMD.

So back then in 2003 it was very clear to see that there would be in 2005 a 
dual core 2.4Ghz montecito (which became 2006 and just 1.6ghz as they seem 
to not put much effort into that itanium anymore,
let alone give it good process technology; all of that totally unknown in 

So the compare of a chip that didn't release yet versus existing hardware is 
not very nice.

It was clear that by end 2005 intel planned to release at least dual core 
chips in x86 respect.

None of that has been used in the extrapolation of this document.

Very weak is the fact that it just is busy with gflops again. It's not 
difficult to get a zillion of gflops
in very weak applications

Just move register to register and you can get big gflops hiep hiep hooray.

Your comment that it answers something about multiplication latencies is not 
unless when a multiplication is the same speed of moving a register or 
performing a XOR.

That is usual not the case.

Not a single word on the number of cycles for the multiplication unit. Also 
it is quite vague about
the speed of a single precision multiplication.

You can of course in rather dumb manner (schools boy) multiply in single 
precision and get with that a double precision multiplication. Requires 4 
multiplies rather than 1.

With Karabatsu, provided shifting isn't slow at the chip and possible 
somehow, you can do with 3 multiplies. This lays a heavy stress on the 
multiplication unit and if that unit is very slow, then
you can't do anything in single precision for double precision mathematics, 
which is what the majority
uses here.

So in order to see whether it is useful to investigate that you need to know 
something about the instruction latencies with respect to what is the worst 
case path usually in a processor, namely its multiplication unit.

Note that the comment is true that for 3d games the thing that matters is 
single precision floating points.
So that the chip is crushing all opposition for 3d is quite obvious.


----- Original Message ----- 
From: "Geoff Jacobs" <gdjacobs at gmail.com>
To: "Vincent Diepeveen" <diep at xs4all.nl>
Cc: "Mark Hahn" <hahn at physics.mcmaster.ca>; <J.A.Delcorso at larc.nasa.gov>; 
<beowulf at beowulf.org>
Sent: Wednesday, September 20, 2006 11:01 PM
Subject: Re: [Beowulf] Has anyone actually seen/used a cell system?

> Vincent Diepeveen wrote:
>> ----- Original Message ----- From: "Mark Hahn" <hahn at physics.mcmaster.ca>
>> To: <J.A.Delcorso at larc.nasa.gov>
>> Cc: <beowulf at beowulf.org>
>> Sent: Wednesday, September 20, 2006 6:51 PM
>> Subject: Re: [Beowulf] Has anyone actually seen/used a cell system?
>>>> Can anyone point me to a url, or tell me what their
>>>> experience is with this technology?  Is it as fast as
>>>> it's purported to be?
>>> I haven't come anywhere near a Cell, but then again, I'm not sure I'd
>>> want to.  14.6 Gflops (64b, and assuming the full 8 SPE's) isn't bad,
>>> but then again, a 3 GHz Core2 dual-core is 24 Gflops, and almost
>>> certainly a lot more accessible, shipping now, runs linux, supported
>>> by compilers and goto-blas, etc.
>> Comeon let's do some realistic comparision. Assuming IBM didn't totally
>> mess up,
>> let's do an objective compare for multiplication.
>> Gflops is an overrated definition simply.
>> The thing determining the number of matrix elements you can multiply a
>> second more than anything else,
>> is the slow instruction on most cpu called multiply.
> See the berkeley article, it's very thorough and discusses cycle time
> for most fp operations on a variety of CPUs.
> http://www.cs.berkeley.edu/~samw/projects/cell/EDGE06_abstract.pdf
> -- 
> Geoffrey D. Jacobs
> Go to the Chinese Restaurant,
> Order the Special

More information about the Beowulf mailing list