[Beowulf] Re: coprocessor to do "physics calculations"

Mon May 8 09:32:34 PDT 2006

At 08:44 AM 5/8/2006, Robert G. Brown wrote:
>On Sat, 6 May 2006, SIM DOG wrote:
>
>>Further to the discussion, AnandTech has a review of an ASUS card 
>>sporting this beastie... (US$300)
>>
>>http://www.anandtech.com/video/showdoc.aspx?i=2751
>>
>>I can vaguely remember seeing some mention of AGEIA publishing the API. 
>>Just Newtonian gravity calcs would be just fine by me... then if only I 
>>could afford a baby GRAPE (there was some talk of a PCI-X card) :/
>>
>>http://astrogrape.org/
>
>Aaaaghh.
>
>It's alive, it's alive!
>The CM-5, the CM-5...:-)
>
>(Actually, not quite if the GRAPE-2004 is really like the CM-2 -- a SIMD
>design rather than MIMD.)
>
>There are a number of questions I would have about the architecture,
>mostly about IPCs and/or other bandwidth limitations.  Saying that it is
>"1 Petaflop" on what looks like a square inch of chip real estate is all
>well and good, but that sounds suspiciously like a theoretical peak
>speed of a (very) large number of SIMD pipelines.  At some point the
>real problem will be keeping them fed with an input dataflow, will it
>not?  As in I don't think that there are a lot of petabyte/sec channels
>out there to keep data moving through the petaflop chain.

Kind of depends on what's inside that box labelled "reduction tree" (Figure 
2 in the GRAPE DR paper).  They describe doing things like matrix 
multiplication, where each PE does a multiply, and the "reduction tree" 
does the summing.  But that implies a N-input adder, no trivial matter.

What it looks like is more of a SIFD (Single Instruction, few data) 
architecture.  The same data gets fed to all PEs, and, presumably the 
instruction stream has some systematic variations (PE #1 takes the first 
element, PE#2 takes the second element, etc.).. They do discuss the memory 
bandwidth issue on page 4.

Maybe a systolic array might be a better way to describe it?

But, let's consider.. say you've got PE's with a 1 ns propagation 
delay/cycle time (perhaps pipelined), so each PE can do "some sort" of 
operation every nanosecond (although the latency through a PE might be many 
nS, with pipelining).  To get to 1 Tflop, you'd need 1000 PEs, which is 
doable today.  To get to 1 Pflop, you'd need a million PEs, which seems a 
bit large, even for 2008.

And, in fact, the GRAPE-DR diagram, mostly in Japanese, shows, one chip at 
1TFLOP, a board with 8 chips, with pairs of boards in a PC for 16 TFLOP, 
then 4 banks of 16 PCs interconnected with 10-100Gbps switches, to get to 
the 1 PFLOP

http://grape-dr.adm.s.u-tokyo.ac.jp/system-en.html  - Here's a brand new 
english version.

Mind you, the whole thing is also supposed to do distributed computing, 
with multiple GRAPEDR clusters interconnected by IPv4/IPv6 dual stacks at 
40-400 Gbps.  Good thing they've got some research group, because this is 
by no means a "rack and stack COTS" kind of beast.

Interestingly, a couple applications for a beast like this that they don't 
happen to mention on the web page include:

nuclear simulations: I would think that this might not be a bad way to do 
Monte-Carlo PIC kinds of models.  If it works using rooms full of Friden 
calculators in the 40s, it will work with rooms full of little simulation 
engines in 2008.

parallel processing to decrypt encrypted data:  Run lots of possible keys 
agains the ciphertext in parallel.  Worked for Colossus, worked for the EFF 
DES cracker box, would work just fine with the GRAPE.  There's an awful lot 
of similarity between genome analysis and cryptanalysis.

>Of course there are things such a card could definitely be used for.
>For example, I'd guess that one could load the pipes with N seeds of a
>good rng and shuffle the output and generates a whole lot of
>rng's/second, which would make simulationists (like myself) potentially
>very happy.  I may be a bit cynical about getting a "real" petaflop
>(e.g. sustained on a real dataflow) out of a single chip (and even more
>so out of a card with 8 chips on it) but hey, if the price was right and
>the programming was easy it might be worth it.
>
>    rgb
>
>>
>>Cheers
>>Stevo
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit 
>>http://www.beowulf.org/mailman/listinfo/beowulf
>
>--
>Robert G. Brown                        http://www.phy.duke.edu/~rgb/
>Duke University Dept. of Physics, Box 90305
>Durham, N.C. 27708-0305
>Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

James Lux, P.E.
Spacecraft Radio Frequency Subsystems Group
Flight Communications Systems Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875