[Beowulf] Mellanox ConnectX-3 MT27500 problems

Brice Goglin brice.goglin at gmail.com
Sat Apr 27 15:18:48 PDT 2013


Hello,

These cards are QDR and even FDR, you should get 56Gbit/s (we see about
50Gbit/s in benchmarks iirc). That what I get on sandy-bridge servers
with the exact same IB card model.

$ ibv_devinfo -v
[...]
            active_width:        4X (2)
            active_speed:        14.0 Gbps (16)


These nodes have been running Debian testing/wheezy (default kernel and
IB packages) for 9 months without problems.

I had to fix the cables to get 56Gbit/s link state. Without Mellanox FDR
cables, I was only getting 40. So maybe check your cables. And if you're
not 100% sure about your switch, try connecting the nodes back-to-back.

You can try upgrading the IB card firmware too. Mine is 2.10.700 (likely
not uptodate anymore, but at least this one works fine).

Where does your "8.5Gbit/s" come from? IB status or benchmarks? If
benchmarks, it could be related to the PCIe link speed. Upgrading the
BIOS and IB firmware help me too (some reboot gave PCIe Gen1 instead of
Gen3). Here's what you should see in lspci if you get PCIe Gen3 8x as
expected:

$ sudo lspci -d 15b3: -vv
[...]
    LnkSta:    Speed 8GT/s, Width x8


Brice




Le 27/04/2013 22:05, Jörg Saßmannshausen a écrit :
> Dear all,
>
> I was wondering whether somebody has/had similar problems as I have.
>
> We have recenctly purchased a bunch of new nodes. These are Sandybridge ones 
> with Mellanox ConnectX-3 MT27500 InfiniBand connectors and this is where I got 
> problems with.
>
> I am usually using Debian Squeeze for my clusters (kernel 2.6.32-5-amd64). 
> Unfortunately, as it turned out I cannot use that kernel as my Intel NIC is 
> not supported here. So I upgraded to 3.2.0-0.bpo.2-amd64 (backport kernel to 
> sqeeze). Here I got network but the InfiniBand is not working. The device is 
> not even recognized by ibstatus. Thus, I decided to do an upgrade (not dist-
> upgrade) to wheezy to get the newer OFED stack.
>
> Here I get the InfiniBand working but only with 8.5 Gb/sec. A simple reseating 
> of the plug increases that to 20 Gb/sec (4X DDR), which is still slower than 
> the speed of the older nodes (40 Gb/sec (4X QDR)).
>
> So I upgraded completely to wheezy (dist-upgrade now) but the problem does not 
> vanish.
> I re-installed squeeze again and installed a vanilla kernel (3.8.8) and the 
> latest OFED stack from their site. And guess what: same experiences here: 
> After a reboot the IfniniBand speed is 8.5 and reseating the plug increases 
> that to 20 Gb/sec. It does not matter whether I connect to the edge switch or 
> to the main switch, in both cases I got the same experiences/observations.
>
> Frankly, I am out of ideas now. I don't think the observed speed change after 
> reseating the plug should happen. I am in touch with the technical support 
> here as well but I think we both are a bit confused.
>
> Now, am I right to assume that the Mellanox ConnectX-3 MT27500 are QDR cards 
> so I should get 40 Gb/sec and not 20 Gb/sec?
>
> Has anybody made similar experiences? Any ideas?
>
> All the best from London
>
> Jörg
>
>




More information about the Beowulf mailing list