[Beowulf] Mellanox ConnectX-3 MT27500 problems

Josh Catana jcatana at gmail.com
Sat Apr 27 16:52:46 PDT 2013


I noticed on systems running xen-kernel netback driver for virtualization,
bandwidth drops to very low rates.
On Apr 27, 2013 6:19 PM, "Brice Goglin" <brice.goglin at gmail.com> wrote:

> Hello,
>
> These cards are QDR and even FDR, you should get 56Gbit/s (we see about
> 50Gbit/s in benchmarks iirc). That what I get on sandy-bridge servers
> with the exact same IB card model.
>
> $ ibv_devinfo -v
> [...]
>             active_width:        4X (2)
>             active_speed:        14.0 Gbps (16)
>
>
> These nodes have been running Debian testing/wheezy (default kernel and
> IB packages) for 9 months without problems.
>
> I had to fix the cables to get 56Gbit/s link state. Without Mellanox FDR
> cables, I was only getting 40. So maybe check your cables. And if you're
> not 100% sure about your switch, try connecting the nodes back-to-back.
>
> You can try upgrading the IB card firmware too. Mine is 2.10.700 (likely
> not uptodate anymore, but at least this one works fine).
>
> Where does your "8.5Gbit/s" come from? IB status or benchmarks? If
> benchmarks, it could be related to the PCIe link speed. Upgrading the
> BIOS and IB firmware help me too (some reboot gave PCIe Gen1 instead of
> Gen3). Here's what you should see in lspci if you get PCIe Gen3 8x as
> expected:
>
> $ sudo lspci -d 15b3: -vv
> [...]
>     LnkSta:    Speed 8GT/s, Width x8
>
>
> Brice
>
>
>
>
> Le 27/04/2013 22:05, Jörg Saßmannshausen a écrit :
> > Dear all,
> >
> > I was wondering whether somebody has/had similar problems as I have.
> >
> > We have recenctly purchased a bunch of new nodes. These are Sandybridge
> ones
> > with Mellanox ConnectX-3 MT27500 InfiniBand connectors and this is where
> I got
> > problems with.
> >
> > I am usually using Debian Squeeze for my clusters (kernel
> 2.6.32-5-amd64).
> > Unfortunately, as it turned out I cannot use that kernel as my Intel NIC
> is
> > not supported here. So I upgraded to 3.2.0-0.bpo.2-amd64 (backport
> kernel to
> > sqeeze). Here I got network but the InfiniBand is not working. The
> device is
> > not even recognized by ibstatus. Thus, I decided to do an upgrade (not
> dist-
> > upgrade) to wheezy to get the newer OFED stack.
> >
> > Here I get the InfiniBand working but only with 8.5 Gb/sec. A simple
> reseating
> > of the plug increases that to 20 Gb/sec (4X DDR), which is still slower
> than
> > the speed of the older nodes (40 Gb/sec (4X QDR)).
> >
> > So I upgraded completely to wheezy (dist-upgrade now) but the problem
> does not
> > vanish.
> > I re-installed squeeze again and installed a vanilla kernel (3.8.8) and
> the
> > latest OFED stack from their site. And guess what: same experiences here:
> > After a reboot the IfniniBand speed is 8.5 and reseating the plug
> increases
> > that to 20 Gb/sec. It does not matter whether I connect to the edge
> switch or
> > to the main switch, in both cases I got the same
> experiences/observations.
> >
> > Frankly, I am out of ideas now. I don't think the observed speed change
> after
> > reseating the plug should happen. I am in touch with the technical
> support
> > here as well but I think we both are a bit confused.
> >
> > Now, am I right to assume that the Mellanox ConnectX-3 MT27500 are QDR
> cards
> > so I should get 40 Gb/sec and not 20 Gb/sec?
> >
> > Has anybody made similar experiences? Any ideas?
> >
> > All the best from London
> >
> > Jörg
> >
> >
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20130427/6b62d403/attachment.html>


More information about the Beowulf mailing list