[Beowulf] Teraflop chip hints at the future

Tue Feb 13 09:36:36 PST 2007

At 07:03 AM 2/13/2007, Richard Walsh wrote:
>Mark Hahn wrote:
>>>It looked like it did IEEE754 doubles.  Any Intel types out there 
>>>to confirm/deny?
>>singles:
>>
>>http://www.pcper.com/article.php?aid=363
>>
>>IMO, the chip is mainly interesting to explore how much we can abandon
>>the von Neumann architecture as a whole, rather than stupidly putting
>>more and more of them onto a chip.  after all, the nearest-neighbor
>>latency (125 ps!) is comparable to cache or even register-file.
>Yes, but how much does it really abandon von Neumann.  It is just a lot
>of little von Neumann machines unless the mesh is fully programmable
>and the DRAM stacks can source data for any operation on any cpu as
>the application's data flows through the application kernel(s) however it
>is laid out across the chip.  And in that case it is a multi-core 
>ASIC emulating
>an FPGA ... why not just use an FPGA ... ;-) ... and avoid wasting all those
>hard-wired functional units that won't be needed for this or that particular
>kernel.

In fact, modern high density FPGAs (viz Xilinx Virtex II 6000 series) 
have partitioned their innards into little cells, some with ALU and 
combinatorial logic and a little memory, some with lots of memory and 
not so much logic.

And, you can program them in Verilog, which is a fairly high level 
language.  There are huge libraries of useful functions out there 
that you can "call".

It's still a bit (a lot?) clunky compared to zapping out C code on a 
general purpose machine, but it can be done.

of an array of FPGA cores on a chip (super-FPGA model).  Less wasted
>hardware.  In some sense, these super, multi-mini-core designs are another
>ASIC hammer looking for a nail.  Fixed instruction architectures ultimately
>waste hardware.   Why not program the processor instead of instructions
>for a predefined one-size fits all ASIC?

I think that as a general rule, the special purpose cores (ASICs) are 
going to be smaller, lower power, and faster (for a given technology) 
than the programmable cores (FPGAs).  Back in the late 90s, I was 
doing tradeoffs between general purpose CPUs (PowerPCs), DSPs 
(ADSP21020), and FPGAs for some signal processing applications.  At 
that time, the DSP could do the FFTs, etc, for the least joules and 
least time.  Since then, however, the FPGAs have pulled ahead, at 
least for spaceflight applications.   But that's not because of 
architectural superiority in a given process.. it's that the FPGAs 
are benefiting from improvements in process (higher density) and 
nobody is designing space qualified DSPs using those processes (so 
they are stuck with the old processes).

Heck, the latest SPARC V8 core from ESA (LEON 3) is often implemented 
in an FPGA, although there are a couple of space qualified ASIC 
implementations (from Atmel and Aeroflex).

In a high volume consumer application, where cost is everything, the 
ASIC is always going to win over the FPGA.  For more specialized 
scientific computing, the trade is a bit more even... But even so, 
the beowulf concept of combining large numbers of commodity computers 
leverages the consumer volume for the specialized application, giving 
up some theoretical performance in exchange for dollars.

James Lux, P.E.
Spacecraft Radio Frequency Subsystems Group
Flight Communications Systems Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875