[Beowulf] fast interconnects
Jim Lux
James.P.Lux at jpl.nasa.gov
Sat May 20 14:54:42 PDT 2006
At 10:16 AM 5/20/2006, Mark Hahn wrote:
> > As for bit error rates.. 10^-15 is the going in worst case, and 10^-18 is
> > the typical design point. A bit of a challenge to test the latter however
> > (10^18 bits at 10^10 bits/sec takes 10^8 seconds)
>
>I did a little research, and could only find reference to 10^-12
>as the target BER for 10gbase-t. I'm not sure how much this would
>matter though - surely people would still use the usual higher-level
>checksum/retransmission, no?
Not necessarily, because frame errors reduce channel capacity and require
larger buffer space, especially if the transit time through the link is
greater than a message length (e.g. a 15kbit frame takes only 15
nanoseconds at 1 Tbps, and that's a couple meters of fiber). You really
would rather the data get to the other end un-errored, rather than any sort
of ack/nak/go back N kind of protocol.
A similar problem exists today with long latency links for TCP/IP.
So, at some point, it really starts to pay to do some "coding", particular
for Forward Error Correction. The 10GigE thing, for instance, contemplates
the uses of (2500,1700) Low Density Parity Check coding (Numbers
approximate, but it's something like 1700 parity bits protecting 2500 data
bits).
> also, from what I read, the main
>concern wrt BER is length-related insertion loss. obviously, if
>the system can manage 10^-12 at 100M, it'll have a much easier time
>for inside-machineroom runs (say, 15M) or in-cluster (<10 most of
>the time).
At a low level (chip to chip, say) and even at a box-to-box level, avoiding
to have any sort of retry mechanism has great value (imagine having a
parity check in the middle of your CPU pipeline, where a bit error would
trigger a pipeline flush.. you'd rather do error correcting codes and keep
the pipeline running)
At high data rates, and relatively short runs (<300m), the error rate tends
not to scale with length, because, particularly for lower speed
implementations today (1 Gbps and lower), the dominant source of problems
is related to interfaces (connectors, etc.) and Tx/Rx crosstalk (near end
and far end), which are "end point" related. There is a length issue,
because of the attenuation in the medium, but, to a certain extent that can
be overcome by just increasing the power pushed into the link (which
aggravates the crosstalk problem, but in a more manageable
way). Basically, once the bits are in the wire, the distance they travel
isn't as much an issue.
James Lux, P.E.
Spacecraft Radio Frequency Subsystems Group
Flight Communications Systems Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875
More information about the Beowulf
mailing list