Channel-bonding/VLAN with Scyld

Mike Weller weller at zyvex.com
Mon Jul 9 16:56:53 PDT 2001


Hello,

I sent an email to the list last week regarding configuring our HP
Procurve 4000m switch for channel-bonding.  I am still having major
problems!!

If the OS was configured for channel-bonding without any switch
configuration, I got only 17Mbps :-( When I turned on trunking for
certain ports on the switch, I got about 100Mbps, which was only a
slight improvement from 1 NIC (plus, it was using SA/DA, so there was
no node-to-node bandwidth improvement).

I got a response suggesting that I should setup 2 VLANs, and have all
eth0's on VLAN-1 and all eth1's on VLAN-2.  The responder said that he
can get 190Mbps.  The trunking configuration is supposed to be for
switch to switch configurations.  He was using an HP 2400 (or was it
2424?).  The manual I have is for both models, so I assume that it
will work with mine as well.

I am still having major difficulties with getting it to work with
Scyld.

I telnetted to my switch, and added VLAN1 and VLAN2.  All slave eth0's
were set to VLAN1 and eth1's to VLAN2.  (Note: Master has 3 NICs, so
eth1 was put on VLAN1 and eth2 on VLAN2).  I did not configure any
overlap between the VLANs.

I had to temporarily disable channel-bonding on the master to get
the slave to boot:

 /etc/init.d/beowulf stop ; ifconfig bond0 down ; ifconfig eth2 down ; 
 ifconfig eth1 down ; ifconfig bond0 inet 10.0.0.1 netmask 255.255.255.0 ; 
 ifconfig eth1 inet 10.0.0.1 netmask 255.255.255.0 ; 
 ifenslave bond0 eth1 ; /etc/init.d/beowulf start

After doing so, the slave node was able to boot up off of ETH0, by
grabbing the image from master's ETH1.

Now the question is, how am I supposed to channel-bond the nodes
after this point?

When ALL NICs were part of the same VLAN, these scripts used to work:

SLAVE:  (no idea if there's a better way)
#!/bin/csh -f
set node=$1
modprobe --node $node bonding
cat <<EOF > /tmp/runme
 ifconfig eth0 inet `bpstat -a $node` netmask 255.255.255.0
 ifconfig bond0 inet `bpstat -a $node` netmask 255.255.255.0
 ifenslave bond0 eth0
 ifenslave bond0 eth1
EOF
bpcp /tmp/runme ${node}:/tmp
bpsh $node nohup csh -f /tmp/runme 

MASTER:
ifenslave bond0 eth2


Now that the NICs are on 2 distinct VLANs, the SLAVE script "hangs",
which makes sense because its eth1 interface is transmitting half of
the packets onto an isolated VLAN that MASTER's eth2 was not
configured for yet.  When I ran the MASTER line immediately after
that, it did not remedy the problem.



More information about the Beowulf mailing list