[Beowulf] Mellanox ConnectX-3 MT27500 problems
Jörg Saßmannshausen
j.sassmannshausen at ucl.ac.uk
Sun Apr 28 05:38:57 PDT 2013
Hi Andrew,
> Use a real operating system.
I got a real OS not a virtual one, or what do you mean?
Jörg
>
> On 28 April 2013 09:36, Jörg Saßmannshausen <j.sassmannshausen at ucl.ac.uk>
wrote:
> > Hi Josh,
> >
> > interesting. However, I am not using XEN on that machine at all and I
> > don't have the XEN kernel installed. Thus, that is not the problem.
> >
> > All the best from a sunny London
> >
> > Jörg
> >
> > On Sonntag 28 April 2013 Josh Catana wrote:
> >> I noticed on systems running xen-kernel netback driver for
> >> virtualization, bandwidth drops to very low rates.
> >>
> >> On Apr 27, 2013 6:19 PM, "Brice Goglin" <brice.goglin at gmail.com> wrote:
> >> > Hello,
> >> >
> >> > These cards are QDR and even FDR, you should get 56Gbit/s (we see
> >> > about 50Gbit/s in benchmarks iirc). That what I get on sandy-bridge
> >> > servers with the exact same IB card model.
> >> >
> >> > $ ibv_devinfo -v
> >> > [...]
> >> >
> >> > active_width: 4X (2)
> >> > active_speed: 14.0 Gbps (16)
> >> >
> >> > These nodes have been running Debian testing/wheezy (default kernel
> >> > and IB packages) for 9 months without problems.
> >> >
> >> > I had to fix the cables to get 56Gbit/s link state. Without Mellanox
> >> > FDR cables, I was only getting 40. So maybe check your cables. And if
> >> > you're not 100% sure about your switch, try connecting the nodes
> >> > back-to-back.
> >> >
> >> > You can try upgrading the IB card firmware too. Mine is 2.10.700
> >> > (likely not uptodate anymore, but at least this one works fine).
> >> >
> >> > Where does your "8.5Gbit/s" come from? IB status or benchmarks? If
> >> > benchmarks, it could be related to the PCIe link speed. Upgrading the
> >> > BIOS and IB firmware help me too (some reboot gave PCIe Gen1 instead
> >> > of Gen3). Here's what you should see in lspci if you get PCIe Gen3 8x
> >> > as expected:
> >> >
> >> > $ sudo lspci -d 15b3: -vv
> >> > [...]
> >> >
> >> > LnkSta: Speed 8GT/s, Width x8
> >> >
> >> > Brice
> >> >
> >> > Le 27/04/2013 22:05, Jörg Saßmannshausen a écrit :
> >> > > Dear all,
> >> > >
> >> > > I was wondering whether somebody has/had similar problems as I have.
> >> > >
> >> > > We have recenctly purchased a bunch of new nodes. These are
> >> > > Sandybridge
> >> >
> >> > ones
> >> >
> >> > > with Mellanox ConnectX-3 MT27500 InfiniBand connectors and this is
> >> > > where
> >> >
> >> > I got
> >> >
> >> > > problems with.
> >> > >
> >> > > I am usually using Debian Squeeze for my clusters (kernel
> >> >
> >> > 2.6.32-5-amd64).
> >> >
> >> > > Unfortunately, as it turned out I cannot use that kernel as my Intel
> >> > > NIC
> >> >
> >> > is
> >> >
> >> > > not supported here. So I upgraded to 3.2.0-0.bpo.2-amd64 (backport
> >> >
> >> > kernel to
> >> >
> >> > > sqeeze). Here I got network but the InfiniBand is not working. The
> >> >
> >> > device is
> >> >
> >> > > not even recognized by ibstatus. Thus, I decided to do an upgrade
> >> > > (not
> >> >
> >> > dist-
> >> >
> >> > > upgrade) to wheezy to get the newer OFED stack.
> >> > >
> >> > > Here I get the InfiniBand working but only with 8.5 Gb/sec. A simple
> >> >
> >> > reseating
> >> >
> >> > > of the plug increases that to 20 Gb/sec (4X DDR), which is still
> >> > > slower
> >> >
> >> > than
> >> >
> >> > > the speed of the older nodes (40 Gb/sec (4X QDR)).
> >> > >
> >> > > So I upgraded completely to wheezy (dist-upgrade now) but the
> >> > > problem
> >> >
> >> > does not
> >> >
> >> > > vanish.
> >> > > I re-installed squeeze again and installed a vanilla kernel (3.8.8)
> >> > > and
> >> >
> >> > the
> >> >
> >> > > latest OFED stack from their site. And guess what: same experiences
> >> > > here: After a reboot the IfniniBand speed is 8.5 and reseating the
> >> > > plug
> >> >
> >> > increases
> >> >
> >> > > that to 20 Gb/sec. It does not matter whether I connect to the edge
> >> >
> >> > switch or
> >> >
> >> > > to the main switch, in both cases I got the same
> >> >
> >> > experiences/observations.
> >> >
> >> > > Frankly, I am out of ideas now. I don't think the observed speed
> >> > > change
> >> >
> >> > after
> >> >
> >> > > reseating the plug should happen. I am in touch with the technical
> >> >
> >> > support
> >> >
> >> > > here as well but I think we both are a bit confused.
> >> > >
> >> > > Now, am I right to assume that the Mellanox ConnectX-3 MT27500 are
> >> > > QDR
> >> >
> >> > cards
> >> >
> >> > > so I should get 40 Gb/sec and not 20 Gb/sec?
> >> > >
> >> > > Has anybody made similar experiences? Any ideas?
> >> > >
> >> > > All the best from London
> >> > >
> >> > > Jörg
> >> >
> >> > _______________________________________________
> >> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> >> > Computing To change your subscription (digest mode or unsubscribe)
> >> > visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> > --
> > *************************************************************
> > Jörg Saßmannshausen
> > University College London
> > Department of Chemistry
> > Gordon Street
> > London
> > WC1H 0AJ
> >
> > email: j.sassmannshausen at ucl.ac.uk
> > web: http://sassy.formativ.net
> >
> > Please avoid sending me Word or PowerPoint attachments.
> > See http://www.gnu.org/philosophy/no-word-attachments.html
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
--
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ
email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
More information about the Beowulf
mailing list