[Beowulf] bonding.txt really confusing, why don't I get higher aggregate bandwidth from multiple TCP connections from multiple gigabit clients with balance-alb bonding on a server?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Sabuj Pattanayek sabujp at gmail.comFri Feb 20 13:55:24 PST 2009
- Previous message: [Beowulf] Supermicro 2U
- Next message: [Beowulf] bonding.txt really confusing, why don't I get higher aggregate bandwidth from multiple TCP connections from multiple gigabit clients with balance-alb bonding on a server?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
I had posted this to the gentoo-cluster list, was told to look in the
archives here and after search for bonding and "balance-alb" in the
beowulf archives found no really clear answers regarding balance-alb
and multiple TCP connections. Furthermore bonding.txt is really
confusing, using terms interchangeably, so I'll post my question here
as well.
I've got the following system:
Linux server 2.6.26 #5 SMP Fri Feb 6 12:18:54 CST 2009 ppc64 PPC970FX,
altivec supported RackMac3,1 GNU/Linux
Setup on a cisco gig switch with both gigabit NICs in the following setup:
% cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)
Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:0d:93:9e:2b:ca
Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:0d:93:9e:2b:cb
Both eth0 and eth1 are running at 1000Mb/s full according to ethtool.
I start netserver (from netperf) on the server and then run netperf on
four other single gigabit connected clients on the same switch with:
netperf -l 45 -t TCP_STREAM -H server
but I get the following outputs:
client 1:
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 45.04 405.68
client 2:
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 45.01 216.07
client 3:
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 45.87 143.93
client 4:
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 45.05 164.05
adding those up gives me:
$ echo "405.68+216.07+143.93+164.05" | bc -l
929.73
Not more than 1gbps.
bonding.txt from the kernel docs says:
"balance-rr: This mode is the only mode that will permit a single
TCP/IP connection to stripe traffic across multiple
interfaces. It is therefore the only mode that will allow a
single TCP/IP stream to utilize more than one interface's
worth of throughput."
That much is understood, for a *single* TCP connection (or stream) I
can't get more than a single NIC's bandwidth unless I have balance-rr
enabled and LACP turned on in the switch. However:
"balance-tlb: The balance-tlb mode balances outgoing traffic by peer.
Since the balancing is done according to MAC address, in a
"gatewayed" configuration (as described above), this mode will
send all traffic across a single device. However, in a
"local" network configuration, this mode balances multiple
local network peers across devices in a vaguely intelligent
manner (not a simple XOR as in balance-xor or 802.3ad mode),
so that mathematically unlucky MAC addresses (i.e., ones that
XOR to the same value) will not all "bunch up" on a single
interface."
What I have is a "local" network configuration with a single switch
and multiple clients. Furthermore:
"balance-alb: This mode is everything that balance-tlb is, and more.
It has all of the features (and restrictions) of balance-tlb,
and will also balance incoming traffic from local network
peers (as described in the Bonding Module Options section,
above)."
So what that says to me is that with balance-alb the bonding driver
should "balance multiple local network peers across devices [NICs] in
a vaguely intelligent manner (not a simple XOR as in balance-xor or
802.3ad mode), so that mathematically unlucky MAC addresses (i.e.,
ones that XOR to the same value) will not all "bunch up" on a single
interface." similar to balance-tlb but for "incoming traffic from
local network peers" as well as outgoing.
But this is not what I'm seeing as in the test above. Shouldn't I be
able to get >> 1gbps with balance-alb mode from multiple TCP streams?
It looks like all the connections are bunching up on one interface. If
for whatever reason what I'm trying to do isn't possible unless I can
turn on LACP in the switch then what's the point of balance-alb or
balance-tlb? This seems like active-backup to me but with both NICs
enabled and sharing the load but at half capacity or something.
Thanks,
Sabuj Pattanayek
- Previous message: [Beowulf] Supermicro 2U
- Next message: [Beowulf] bonding.txt really confusing, why don't I get higher aggregate bandwidth from multiple TCP connections from multiple gigabit clients with balance-alb bonding on a server?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
