[Beowulf] Channel bonding, again

Kilian CAVALOTTI kilian at stanford.edu
Mon Oct 15 10:24:06 PDT 2007


Hi Carsten,

On Sunday 14 October 2007 09:19:33 am Carsten Aulbert wrote:
> I don't know from the top of my head which versions we have used, but
> our problem with LACP was that the switch (mostly ProCurve 2900, but
> i think the Cisco 4948 behaved similarly, but that I would need to
> cross-check) was using only a single of the available two or four
> lines to the node for a single connection. Thus a node could handle
> two different 1 GB/s connections at the same time and reaching almost
> 2 Gb/s in total, but we  never saw a single connection using all the
> available bandwidth.
>
> That was the reason our student came up with this VLAN trick.

Indeed, with a trunked LACP link, a single connection will only go over 
one link. But you can have up to your-number-of-trunk-lines transfers 
going wire-speed at the same time. I guess it all depends on what you 
need. :)

We're using LACP to aggregate links between our users and our cluster 
firewall, like:

user \
      \             trunk               trunk
user --  | switch | ====== | firewall | ===== cluster
      /
user /

So in that setup, each user's individual connection is limited by its 
own NIC (and often disk i/o), which is at most GigE. Our point is 
letting more than only one user transfer data at Gbps speeds.

I guess that in our case, the VLAN trick couldn't really work,  since, 
if I understood correctly, the switch has to be the receiveing end for 
the aggregagtion to work. For instance, the trunking host can send data 
using several links, but it can only receive using one, because the 
switch can't load balance and has to choose one interface/VLAN to send 
data through, is that right?

I'm quite surprised balance-tlb could crash a node too, but I didn't try 
recently.

Cheers,
-- 
Kilian



More information about the Beowulf mailing list