Trouble with Bonding Broken MPI
Todd Broucksou
tbroucks at cnidr.org
Tue Jun 25 11:17:16 PDT 2002
All,
I have a RedHat 7.2, kernel 2.4.9-31 cluster on 8 Athlon MP's. Recently
added second 3c590 nic to bring up Channel Bonding. According to ifconfig
-a my Bond0 is up. I can ping across without any packet loss. And can ftp
internal to the cluster without any loss. But both the LAM and MPitch
mpirun are broken. I can add servers with lamboot or pg4 but can not run
mpirun. I get a network error. LAM and MPitch did work before "upgrade" to
channel bonding.
I have tried two separate Cisco switches and one Cisco switch running
EtherChannel still does not work with MPIrun type programs.
I have even tried recompiling Mpitch without any change in success.
Any help would be appreciated.
More information about the Beowulf
mailing list