[Beowulf] Mellanox ConnectX-3 MT27500 problems

Andrew Holway andrew.holway at gmail.com
Sun Apr 28 05:32:39 PDT 2013


Use a real operating system.

On 28 April 2013 09:36, Jörg Saßmannshausen <j.sassmannshausen at ucl.ac.uk> wrote:
> Hi Josh,
>
> interesting. However, I am not using XEN on that machine at all and I don't
> have the XEN kernel installed. Thus, that is not the problem.
>
> All the best from a sunny London
>
> Jörg
>
>
> On Sonntag 28 April 2013 Josh Catana wrote:
>> I noticed on systems running xen-kernel netback driver for virtualization,
>> bandwidth drops to very low rates.
>>
>> On Apr 27, 2013 6:19 PM, "Brice Goglin" <brice.goglin at gmail.com> wrote:
>> > Hello,
>> >
>> > These cards are QDR and even FDR, you should get 56Gbit/s (we see about
>> > 50Gbit/s in benchmarks iirc). That what I get on sandy-bridge servers
>> > with the exact same IB card model.
>> >
>> > $ ibv_devinfo -v
>> > [...]
>> >
>> >             active_width:        4X (2)
>> >             active_speed:        14.0 Gbps (16)
>> >
>> > These nodes have been running Debian testing/wheezy (default kernel and
>> > IB packages) for 9 months without problems.
>> >
>> > I had to fix the cables to get 56Gbit/s link state. Without Mellanox FDR
>> > cables, I was only getting 40. So maybe check your cables. And if you're
>> > not 100% sure about your switch, try connecting the nodes back-to-back.
>> >
>> > You can try upgrading the IB card firmware too. Mine is 2.10.700 (likely
>> > not uptodate anymore, but at least this one works fine).
>> >
>> > Where does your "8.5Gbit/s" come from? IB status or benchmarks? If
>> > benchmarks, it could be related to the PCIe link speed. Upgrading the
>> > BIOS and IB firmware help me too (some reboot gave PCIe Gen1 instead of
>> > Gen3). Here's what you should see in lspci if you get PCIe Gen3 8x as
>> > expected:
>> >
>> > $ sudo lspci -d 15b3: -vv
>> > [...]
>> >
>> >     LnkSta:    Speed 8GT/s, Width x8
>> >
>> > Brice
>> >
>> > Le 27/04/2013 22:05, Jörg Saßmannshausen a écrit :
>> > > Dear all,
>> > >
>> > > I was wondering whether somebody has/had similar problems as I have.
>> > >
>> > > We have recenctly purchased a bunch of new nodes. These are Sandybridge
>> >
>> > ones
>> >
>> > > with Mellanox ConnectX-3 MT27500 InfiniBand connectors and this is
>> > > where
>> >
>> > I got
>> >
>> > > problems with.
>> > >
>> > > I am usually using Debian Squeeze for my clusters (kernel
>> >
>> > 2.6.32-5-amd64).
>> >
>> > > Unfortunately, as it turned out I cannot use that kernel as my Intel
>> > > NIC
>> >
>> > is
>> >
>> > > not supported here. So I upgraded to 3.2.0-0.bpo.2-amd64 (backport
>> >
>> > kernel to
>> >
>> > > sqeeze). Here I got network but the InfiniBand is not working. The
>> >
>> > device is
>> >
>> > > not even recognized by ibstatus. Thus, I decided to do an upgrade (not
>> >
>> > dist-
>> >
>> > > upgrade) to wheezy to get the newer OFED stack.
>> > >
>> > > Here I get the InfiniBand working but only with 8.5 Gb/sec. A simple
>> >
>> > reseating
>> >
>> > > of the plug increases that to 20 Gb/sec (4X DDR), which is still slower
>> >
>> > than
>> >
>> > > the speed of the older nodes (40 Gb/sec (4X QDR)).
>> > >
>> > > So I upgraded completely to wheezy (dist-upgrade now) but the problem
>> >
>> > does not
>> >
>> > > vanish.
>> > > I re-installed squeeze again and installed a vanilla kernel (3.8.8) and
>> >
>> > the
>> >
>> > > latest OFED stack from their site. And guess what: same experiences
>> > > here: After a reboot the IfniniBand speed is 8.5 and reseating the
>> > > plug
>> >
>> > increases
>> >
>> > > that to 20 Gb/sec. It does not matter whether I connect to the edge
>> >
>> > switch or
>> >
>> > > to the main switch, in both cases I got the same
>> >
>> > experiences/observations.
>> >
>> > > Frankly, I am out of ideas now. I don't think the observed speed change
>> >
>> > after
>> >
>> > > reseating the plug should happen. I am in touch with the technical
>> >
>> > support
>> >
>> > > here as well but I think we both are a bit confused.
>> > >
>> > > Now, am I right to assume that the Mellanox ConnectX-3 MT27500 are QDR
>> >
>> > cards
>> >
>> > > so I should get 40 Gb/sec and not 20 Gb/sec?
>> > >
>> > > Has anybody made similar experiences? Any ideas?
>> > >
>> > > All the best from London
>> > >
>> > > Jörg
>> >
>> > _______________________________________________
>> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> > To change your subscription (digest mode or unsubscribe) visit
>> > http://www.beowulf.org/mailman/listinfo/beowulf
>
>
> --
> *************************************************************
> Jörg Saßmannshausen
> University College London
> Department of Chemistry
> Gordon Street
> London
> WC1H 0AJ
>
> email: j.sassmannshausen at ucl.ac.uk
> web: http://sassy.formativ.net
>
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list