[Beowulf] [jak at uiuc.edu: Re: [APPL:Xgrid] [Xgrid] Re: megaFlopsper Dollar? real world requirements]

Sat May 14 08:27:27 PDT 2005

----- Original Message -----
From: "Robert G. Brown" <rgb at phy.duke.edu>
] [Xgrid] Re: megaFlopsper Dollar? real world requirements]

> On Sun, 15 May 2005, Eugen Leitl wrote:
>
> > ----- Forwarded message from "Jay A. Kreibich" <jak at uiuc.edu> -----
> > On Thu, May 12, 2005 at 01:45:45PM -0500, Jay A. Kreibich scratched on
the wall:
>
> (A loverly discussion of IPoFW)
>
> Hey, Jay, over here in beowulf-land WE appreciate your analysis.  I
> found it very useful and entirely believable.  In fact, I've seen very
> similar behavior to that which you describe in some low end (RTL 8139)
> 100 Mbps ethernet adapters -- the bit about network throughput going
> absolutely to hell in a handbasket if the network becomes congested
> enough to saturate the cards.  In the case of the RTL, I recall it was a
> buffering issue (or rather, the lack thereof) so overruns were dropped
> and a full-out TCP stream could drop to 2 Mbps on a particularly bad
> run.
>
> We also understand the difference between "latency" (which usually
> dominates small packet/message transfer) and "bandwidth" (which is
> usually wirespeed less mandatory overhead for some optimal pattern of
> data transfer, e.g. large messages in TCP/IP).  In fact, your whole
> article was very, very cogent.
>
> Thanks!
>
>    rgb
>
> >
>

Don't forget that the highest data rate available is FedEx'ing a box of
diskdrives.

I can shed a bit of light on how IEEE-1394 (Firewire is an Apple tradename)
works.
As Jay mentioned, it's optimized for bulk streaming transfers: live video is
a fine example.  It also has a capability for isochronous transfers, where
data gets delivered at a constant rate (to reduce buffering needs for things
like video)

Connecting multiple 1394 widgets together is simple to do, but the
underlying network management is somewhat complex.
Consider the following configuration

A : B : C : D

That is, A is connected to B, and B is connected to C, which is then
connected to D

As you plug and unplug cables, the network renegotiates and reconfigures
it's routing tables in preparation for that "connection request" from one
node asking to connect to another.  When the request is issued, a path
through the network is established for the duration.  (Think of routing a
video signal from a camera through a recorder to a TV).    Typical 1394
devices have 2 external ports plus an internal port, interconnected at a
sort of "hub", but it's not a passive hub. It has significant smarts, maybe
a "switch" might be a better conceptual model.

Much like USB, each node has a particular maximum speed, so in our example
above, if node B happened to be a slow speed node, then the A-C link would
be limited by that speed.  One could, of course, install multiple 1394
interfaces in devices (building a network of point to point links).

The other wrench in the works is that the raw bandwidth between two nodes is
divided up into channels (just like buying a T-1 data wire... you can run
one fat pipe, several 384 kbps H.320 links, or 64kbps voice channels, or any
combination).  If you're just streaming sound for instance, you can allocate
an isochronous channel to carry that bandwidth, or, if you need video, you
allocate more bonded isochronous channels.  There's always a low rate
channel available for "ad-hoc" messages.

This is a gross oversimplification, and I've not necessarily used 1394
terminology.

The upshot is that 1394 is probably a terrible communications method (in
it's usual implementation) for a general cluster computer.  If you had a
very tightly coupled task, with very deterministic execution times on the
nodes, then you might be able to take advantage of the high rate/isochronous
channels, but this would be like designing a special purpose systolic array
processor.