[Beowulf] coprocessor to do "physics calculations"

Fri May 5 10:00:39 PDT 2006

Robert G. Brown wrote:

> I agree -- some sort of coupled ODE solver, maybe with some firmware
> support for things like gravitation and drag forces built on top of a
> DSP/vector processor?

Likely it has a linearized version of an ODE solver.  Nothing 
spectacular.  Shouldn't be that hard to code a simple Runge-Kutta or 
even a Simpson's rule in hardware ...

> That could certainly be useful to physics simulations depending on a)
> its speed; and b) its programmability.  If it is all in the card
> firmware with a high level API, you may not be able to do real physics
> with it (at least in a way that would be useful to most folks).  As in
> \vec{F} = -mg\hat{z} won't work for cosmologists, dynamics on the moon
> (unless g is adjustable), and so on.

Likely it has "game physics" and the major question is how do game 
physics map against the real world physics.

> There is also the issue of speed.  The reason to have it do physics as a
> coprocessor task in gaming is that the CPU has other things to do --

Thats not the only reason ...  Dedicated hardware, well designed to the 
task, can absolutely wipe the floor with general purpose hardware.  GPU 
vs CPU is a great example of this.

> e.g. process the interactive I/O stream, manage the graphical
> presentation of the objects in the game field of view, implement the
> game's basic logic, play sounds in real time while doing all of this.

You want the graphics hardware doing as much work as possible, so you 
want the CPU not doing that work.  Hence you are going to push the 
graphics processing to the graphics processor.  Through this nice OpenGL 
API that runs everywhere...

> Offloading ODE solution clearly leaves it better able to manage all of
> this (much of which is interrupt driven) with less chance of game
> jerkiness as it swaps sub-tasks in and out (often a problem with WinXX
> to my own experience, at least -- it has gotten better as CPUs have
> gotten SO fast they smooth out issues with the scheduler, but as WinXX
> boxes have to manage relatively intense multitasking and interrupt
> processing they can still crump a bit).  So the coprocessor doesn't have
> to be "fast", just fast enough to keep up while freeing the main CPU to

I disagree.  For a coprocessor to be useful in any context it has to be 
"fast" in the sense of "not slower" than the alternative.  If you are 
looking to offload the CPU, then get a second CPU.  If you are looking 
to perform specialized calculations, then get the appropriate 
coprocessor (ala GPU, ...).

Note:  I have been calling the coprocessors "APUs" as of late for 
Acceleration Processing Units.  GPUs are one example of these, but there 
are many others.  It just saves a little time/effort explaining things I 
have found.

> have a good effect, with the best effect being on the oldest/slowest
> hardware or with games with a really nasty set of I/O streams.
> 
> For a real physics simulation, though, it is all about speed. 

Absolutely.

> I'll bet
> many a dime that an AMD 64 or Opteron doing the ODEs native would beat
> the same machine solving the same ODEs with the attached engine, and

For the PhyX APU (coprocessor), I would bet you are right, though I 
wouldn't put money on it.

For a different APU (coprocessor) on a different problem set, I would be 
betting heavily against you*.  GPUs are just one example.

> you've got all sorts of new I/O issues to deal with as the CPU has to
> tell it what to do and collect the results. 

You have those issues with GPUs and OpenGL.  Doesn't seem to hurt it 
(well designed API, well designed boards/bus/...)

> I think that there will
> likely be only a very narrow range of real computations that might
> benefit although (as always) I would be perfectly happy to be proven
> wrong.

:) I think the modifier "very narrow" is incorrect.  I think "subset" is 
more correct.

They won't be useful for everything, but HPC tends to have a number of 
patterns in computing (be it FP/Int heavy, etc) that do make it amenable 
to acceleration.  Vector processors (the real ones) were a great example 
of this.  SIMD/SSE/Altivec are other examples (though IMO somewhat 
crippled examples).

> This may be yet another version of the famed "let's make a sony
> playstation (or Xbox, or DSP, or whatever) cluster" discussion.  They,
> too, have (or "are") integrated "physics" engines.  Yet it never quite
> makes sense compared to just going with the best general purpose CPU.

I strongly disagree with that last sentence.  The example I give (again) 
is the graphics card.  You can make exactly, precisely the same argument 
about doing graphics on the CPU versus on the GPU, that it simply 
doesn't make sense.  But that argument would be IMO, wrong.

I think it is far more correct to use a tautology/Yogi-Berra-ism that 
APUs in general will be useful where they are useful.  That is, there 
exists a subset of problems amenable to acceleration.  My argument is 
that this subset is not minuscule.  It doesn't encompass the entire 
market either.  In the case of graphics, there is a minuscule portion of 
the market not served by acceleration products.

Joe

> 
>    rgb

* I only bet when I already know the answer :)

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615