[Beowulf] Quasi-Non-Von-Neumann hardware in a Beowulf cluster.

Robert G. Brown rgb at phy.duke.edu
Thu Mar 10 12:29:11 PST 2005

On Thu, 10 Mar 2005, Joe Landman wrote:

> So there is an expression that I like attributing to myself, but I may 
> have "borrowed" it from elsewhere.
> 	Something designed to fail often will.
> The "general purpose" accelerator cards (transputer, NS32032, ...) all 
> suffered from a lack of application focus among other things.  There was 
> the prevalent attitude of "if you build it, then they will buy".  These 
> units largely failed to take hold apart from tiny niches.
> OTOH, "specialized" accelerator cards (Graphics cards, RAID cards, Sound 
> cards) have been a smashing success, as the CBA makes sense, they 
> deliver a specific value, and they are easy to use.  The take home 
> message is that any accelerator card needs to do the same.  What these 
> accelerator cards do is offload work from the CPU.  Not all of the will 
> work as businesses, and this isn't a magical formula for success.

And you have the volume issue.  Offhand, I can easily think of at least
a few HPC coprocessor cards that might be useful in a cluster:

A $30 PCI-bus card that does nothing but generate super-high-quality
random numbers (uniform deviates of various widths and/or ints of
various widths) at high speed (faster than the CPU can, which means say
50 megarands/second or better) and deliver them directly to memory
without the CPUs help (so one can build a circular queue and keep it
full with only occasional calls requesting the next block of rands)
would be a Great Boon to Monte Carlo-heads like myself.

A $30 linear algebra card.  Yes, I know -- a lot of graphics cards are
essentially vector processors and can be used in this way, but I'm not
satisfied.  These cards aren't DESIGNED to be used as general purpose
LAPACK-like or BLAS-like engines.  I can't help but think that one could
design a set of such cards that would function like a little
mini-cluster even within a single system, partitioning the problem and
doing sub-blocks, all in parallel with the main CPU and working directly
with memory.

There are probably more, but random numbers and linear algebra are both
major components of a lot of work.  Look at the problem here.  Your $30
graphics chip is used in tens of millions of units per year.  Your $30
random number generator card a) has to "work", which is not trivial to
arrange.  I have a bitwise random number test in dieharder (a GPL
package for testing random numbers I'm working on) that every supposedly
rng in the GSL fails at six bits, most still fail at five bits, and
quite a few fail at four.  That is, forget uniform distribution of
BYTES, let alone 4 bytes sequences -- there are measureable deviations
away from random for just 6 bit substrings of a long string of bits.
Hardware rng's are often no better. b) you have to be able to sell
enough to make money, and that will be tough at $30/card...

Ditto for linear algebra, although there there is a high end market and
companies DO sell engines for a lot more than $30 to a very small
market.  And make money.

We just want the best of both worlds...

> Moreover, the "specialized" GPUs seem to have applicability in CFD and 
> other areas.  This is interesting as it opens a possibility for 
> significant acceleration of some computations.  They fundamental 
> question is whether or not there will be wide adoption.  I am not seeing 
> wide adoption of the GPU as a CFD engine right now, but what if you had 
> a "CFD engine" chip that cost about the  same as the GPU, stuck it on a 
> card, and had a high level language interface to it, so you hand it your 
> expensive routines to crank on.
> The physics chip bit got me thinking along the molecular dynamics lines 
> last night, specifically the non-bonded calculations.  I am sure others 
> could regail us with their computational burdens (and I would like to 
> hear them myself at some point in time, it is quite instructive to hear 
> what people are worrying about).

Ya, stuff like this would be great -- ODE solvers on a chip or add-on
card.  But NOT easy to build and NOT that big a market.


> I think the physics chip in hardware is a neat idea, though I think you 
> need a high level interface to it, open standards, and lots of support 
> to make it work.  Moreover, it needs to be programmable: not because 
> physics changes so often, but because the implied models may differ from 
> what you want.
> As I said, I am curious, and I think it is an interesting idea.  If done 
> right, with the wind at the right angles, good user/community support, I 
> think it could work :)
> > 
> >    rgb
> > 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list