[Beowulf] Vector coprocessors

Bill Broadley bill at cse.ucdavis.edu
Wed Mar 15 18:31:17 PST 2006

I noticed a few news reports on Intel/AMD considering the Clearspeed

Looks like a fairly interesting widget, here's an Intel/Clearspeed paper
that describes it:

Some interesting snippets on the Clearspeed advance board:
* 192 pipelines, 2 flops per clock (not fused), 250 MHz, peak 96GFlops
  (I believe this is for 2 chips)
* 50 GFlops sustained with the DGEMM kernel
* 1 GB of ram per board.
* 128 registers per PE, register file allows 3 reads 2 writes per clock
* 1.44 MB of SRAM that can deliver one word per FP op per clock.
* 800MB/sec over pci-x, enough for 50 GFlops on DGEMM.
* Less than 10 watts while sustaining 25 GFlops
* 1-D complex FFTs of 1024 elements @ 400k per second (20 GFlops with 32-bit),
  but only 1/4th of that streaming because of pci-x bottlenecks.
* 12 GFlops when running 2-d FFTs (512x512 single precision) that are
  resident on board (in the 1GB)

In any case it looks like an interesting development.

Speaking of which, what is the double precision peak rate of today's p4 
and opteron?  One 128 bit SSE operation every other cycle (so 1 64 bit
flop per cycle)?  I believe Intel mentioned doubling this rate at IDF
(shipping sometime in the 2nd half of this year).

Bill Broadley
Computational Science and Engineering
UC Davis

More information about the Beowulf mailing list