[Beowulf] IPoIB failure
samuel at unimelb.edu.au
Tue Jan 27 15:57:50 PST 2015
On 28/01/15 10:34, Skylar Thompson wrote:
> We've had some problems with the RHEL-provided OFED stack interfering
> with the Mellanox one. One of the systems we've experienced in the past
> is some IB services (like RDMA) work wile others (like IPoIB) don't.
> Using the Mellanox install script in the MLNX OFED package clears this
> up. I wonder if this is what's going on?
Quite possible - we deliberately only run the OFED in RHEL everywhere
because of the BG/Q.
When we brought up our most recent Intel cluster in 2013 we were advised
to use Mellanox OFED on it but found we had massive problems talking to
the GPFS NSDs which had to run RHEL OFED to talk to the BlueGene/Q on a
different switch (the NSDs are on both IB switches), so we ditched MLNX
OFED and went back to RHEL.
All the best,
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
More information about the Beowulf