[Beowulf] Vector coprocessors
Daniel Pfenniger
daniel.pfenniger at obs.unige.ch
Thu Mar 16 00:04:32 PST 2006
The shipment of this accelerator card has been delayed many times. Last time
I asked was October 2005. Apparently the first shipment has been made this
month for a Japanese supercomputer with 10^4 Opterons. The cost is not
indicated, but something like above $8000.- per card would put it outside
commodity hardware. I wouldn't be astonished that more performance can
be obtained in most applications with commodity clustering.
If Clearspeed would consider mass production with a cost like $100.-$500.-
per card the market would be huge, because the card would be competing with
multi-core processors like the IBM-Sony Cell.
The possibly most interesting niche for the Clearspeed cards appears to me
accelerating proprietary applications like Matlab, Mathematica and particularly
Excel that run on a single PC and that can hardly be reprogrammed by their
users to run on a distributed cluster.
Dan
Bill Broadley wrote:
> I noticed a few news reports on Intel/AMD considering the Clearspeed
> co-processor.
>
> Looks like a fairly interesting widget, here's an Intel/Clearspeed paper
> that describes it:
> http://www.clearspeed.com/downloads/Intel%20Math%20Kernel%20whitepaper.pdf
>
> Some interesting snippets on the Clearspeed advance board:
> * 192 pipelines, 2 flops per clock (not fused), 250 MHz, peak 96GFlops
> (I believe this is for 2 chips)
> * 50 GFlops sustained with the DGEMM kernel
> * 1 GB of ram per board.
> * 128 registers per PE, register file allows 3 reads 2 writes per clock
> * 1.44 MB of SRAM that can deliver one word per FP op per clock.
> * 800MB/sec over pci-x, enough for 50 GFlops on DGEMM.
> * Less than 10 watts while sustaining 25 GFlops
> * 1-D complex FFTs of 1024 elements @ 400k per second (20 GFlops with 32-bit),
> but only 1/4th of that streaming because of pci-x bottlenecks.
> * 12 GFlops when running 2-d FFTs (512x512 single precision) that are
> resident on board (in the 1GB)
>
> In any case it looks like an interesting development.
>
> Speaking of which, what is the double precision peak rate of today's p4
> and opteron? One 128 bit SSE operation every other cycle (so 1 64 bit
> flop per cycle)? I believe Intel mentioned doubling this rate at IDF
> (shipping sometime in the 2nd half of this year).
>
More information about the Beowulf
mailing list