Trouble with Bonding Broken MPI

Todd Broucksou tbroucks at cnidr.org
Tue Jun 25 11:17:16 PDT 2002


All,
  I have a RedHat 7.2, kernel 2.4.9-31 cluster on 8 Athlon MP's. Recently 
added second 3c590 nic to bring up Channel Bonding. According to ifconfig 
-a my Bond0 is up. I can ping across without any packet loss. And can ftp 
internal to the cluster without any loss. But both the LAM and MPitch 
mpirun are broken. I can add servers with lamboot or pg4 but can not run 
mpirun. I get a network error. LAM and MPitch did work before "upgrade" to 
channel bonding.
  I have tried two separate Cisco switches and one Cisco switch running 
EtherChannel still does not work with MPIrun type programs.
  I have even tried recompiling Mpitch without any change in success.
  Any help would be appreciated.




More information about the Beowulf mailing list