[Beowulf] Mellanox ConnectX-3 MT27500 problems

Jörg Saßmannshausen j.sassmannshausen at ucl.ac.uk
Sun Apr 28 00:36:57 PDT 2013


Hi Josh,

interesting. However, I am not using XEN on that machine at all and I don't 
have the XEN kernel installed. Thus, that is not the problem.

All the best from a sunny London

Jörg


On Sonntag 28 April 2013 Josh Catana wrote:
> I noticed on systems running xen-kernel netback driver for virtualization,
> bandwidth drops to very low rates.
> 
> On Apr 27, 2013 6:19 PM, "Brice Goglin" <brice.goglin at gmail.com> wrote:
> > Hello,
> > 
> > These cards are QDR and even FDR, you should get 56Gbit/s (we see about
> > 50Gbit/s in benchmarks iirc). That what I get on sandy-bridge servers
> > with the exact same IB card model.
> > 
> > $ ibv_devinfo -v
> > [...]
> > 
> >             active_width:        4X (2)
> >             active_speed:        14.0 Gbps (16)
> > 
> > These nodes have been running Debian testing/wheezy (default kernel and
> > IB packages) for 9 months without problems.
> > 
> > I had to fix the cables to get 56Gbit/s link state. Without Mellanox FDR
> > cables, I was only getting 40. So maybe check your cables. And if you're
> > not 100% sure about your switch, try connecting the nodes back-to-back.
> > 
> > You can try upgrading the IB card firmware too. Mine is 2.10.700 (likely
> > not uptodate anymore, but at least this one works fine).
> > 
> > Where does your "8.5Gbit/s" come from? IB status or benchmarks? If
> > benchmarks, it could be related to the PCIe link speed. Upgrading the
> > BIOS and IB firmware help me too (some reboot gave PCIe Gen1 instead of
> > Gen3). Here's what you should see in lspci if you get PCIe Gen3 8x as
> > expected:
> > 
> > $ sudo lspci -d 15b3: -vv
> > [...]
> > 
> >     LnkSta:    Speed 8GT/s, Width x8
> > 
> > Brice
> > 
> > Le 27/04/2013 22:05, Jörg Saßmannshausen a écrit :
> > > Dear all,
> > > 
> > > I was wondering whether somebody has/had similar problems as I have.
> > > 
> > > We have recenctly purchased a bunch of new nodes. These are Sandybridge
> > 
> > ones
> > 
> > > with Mellanox ConnectX-3 MT27500 InfiniBand connectors and this is
> > > where
> > 
> > I got
> > 
> > > problems with.
> > > 
> > > I am usually using Debian Squeeze for my clusters (kernel
> > 
> > 2.6.32-5-amd64).
> > 
> > > Unfortunately, as it turned out I cannot use that kernel as my Intel
> > > NIC
> > 
> > is
> > 
> > > not supported here. So I upgraded to 3.2.0-0.bpo.2-amd64 (backport
> > 
> > kernel to
> > 
> > > sqeeze). Here I got network but the InfiniBand is not working. The
> > 
> > device is
> > 
> > > not even recognized by ibstatus. Thus, I decided to do an upgrade (not
> > 
> > dist-
> > 
> > > upgrade) to wheezy to get the newer OFED stack.
> > > 
> > > Here I get the InfiniBand working but only with 8.5 Gb/sec. A simple
> > 
> > reseating
> > 
> > > of the plug increases that to 20 Gb/sec (4X DDR), which is still slower
> > 
> > than
> > 
> > > the speed of the older nodes (40 Gb/sec (4X QDR)).
> > > 
> > > So I upgraded completely to wheezy (dist-upgrade now) but the problem
> > 
> > does not
> > 
> > > vanish.
> > > I re-installed squeeze again and installed a vanilla kernel (3.8.8) and
> > 
> > the
> > 
> > > latest OFED stack from their site. And guess what: same experiences
> > > here: After a reboot the IfniniBand speed is 8.5 and reseating the
> > > plug
> > 
> > increases
> > 
> > > that to 20 Gb/sec. It does not matter whether I connect to the edge
> > 
> > switch or
> > 
> > > to the main switch, in both cases I got the same
> > 
> > experiences/observations.
> > 
> > > Frankly, I am out of ideas now. I don't think the observed speed change
> > 
> > after
> > 
> > > reseating the plug should happen. I am in touch with the technical
> > 
> > support
> > 
> > > here as well but I think we both are a bit confused.
> > > 
> > > Now, am I right to assume that the Mellanox ConnectX-3 MT27500 are QDR
> > 
> > cards
> > 
> > > so I should get 40 Gb/sec and not 20 Gb/sec?
> > > 
> > > Has anybody made similar experiences? Any ideas?
> > > 
> > > All the best from London
> > > 
> > > Jörg
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf


-- 
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html




More information about the Beowulf mailing list