[Beowulf] fast interconnects
James.P.Lux at jpl.nasa.gov
Mon Dec 5 07:33:53 PST 2005
At 06:47 AM 12/5/2005, you wrote:
> > There's all kinds of useful things one can do if you have a multi Gbit/sec
>well, the bar is at more like 1.5 GB/s right now. quadrics trails a bit
>because it's not full-duplex in its current pcix incarnation, but everything
>10GE or IB-based is well above. myri 2G is pretty much a legacy product,
>as is gigabit (only being nearly free keeps it from being obsolete...)
> > interconnect. Various signal processing (Synthetic Aperture Radar
> > processing, Hyperspectral imaging compression, signal analysis) spring
>I guess you mean that practical applications of this stuff would need to
>work on data much larger than a single node's memory (which is at least 2-16GB
>in the current market). I have users who swear by FFTW 2.x's MPI code,
>and claim that it works well even on gigabit-based systems.
Indeed.. think in terms of a real time stream of data, as opposed to a
multiple data streams coming in from a sensor at 100Mbps or more, processed
data coming out at a few Mbit/sec.
A trivial example (and clearly a bad use for a cluster, since there is
custom silicon available) is compressing digital video from the raw CCIR601
style samples (4:2:2, 270 Mbps) into a compressed stream at 19 Mbps. Most
of the compression is small 8x8 transforms.
Another example is where you have a wideband signal coming in, and you are
implementing some sort of digital analysis/receiver. Imagine digitizing
the entire Low VHF communications band from 30 to 88 MHz (about 120
Msamples/second) feeding it out to a raft of processors and having each
processor find, extract and process one signal. Or, more practically,
you'd have a series of band receivers, each of which grabs, say, 10 MHz
wide and feeds it to the cluster. You want to track a frequency hopping
radio, so, when a signal disappears, you need to look for a new signal with
similar modulation characteristics popping up somewhere else at about the
There ARE other ways to solve these problems, even with general purpose
hardware, particularly if you can "batch" the data. However batching the
data increases the latency, and there are applications where low latency is
required, and you don't want to wait until you've got 10 million samples to
> > hard work, the problem has been getting the data in and out (a 1 GFLOP
> > processor doesn't buy you much if the data rate in and out is 60+
> > Megatransfer/second (=133 MHz:2 (1 in, 1 out)))
>really? what do you think the flops/byte ratio is for this domain?
Depending on the algorithms, it could be quite low.. a few multiplies and
adds per sample. In systolic arrays (which rely on extremely parallelized
algorithms) it might be only one op/sample. I'd never claim that a cluster
of commodity processors is a *good* way to implement a systolic array, but
it's an example where you have data flowing "through" a processor at a high
rate, but don't need much processing on each node.
I guess.. any of these exceedingly fine grained processes puts a heavy
burden on the interconnect, and faster is always better, especially if it
allows you to trade commodity processors and commodity C programmers for
custom ASICs and chip designers at a million bucks a spin.
> > The interconnect speed is what drives folks to incredibly expensive
> ASIC or
> > FPGA solutions.
Yep. But, just as folks like using clusters built of commodity computers in
preference to specialized supercomputers in the more traditional HPC world,
the same is true in signal processing. All the good stuff about Beowulfs
- cheap hardware to get started
- scalability so you can start small (cheap)
- easy access to tools (e.g. gcc, all manner of libraries, matlab/octave)
- low learning curve to get started
applies to the signal processing world as well.
James Lux, P.E.
Spacecraft Radio Frequency Subsystems Group
Flight Communications Systems Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
More information about the Beowulf