[Beowulf] Infinipath memory parity errors
Nifty niftyompi Mitch
niftyompi at niftyegg.com
Thu Aug 14 13:45:01 PDT 2008
On Thu, Aug 14, 2008 at 02:47:58PM -0400, Mark Kosmowski wrote:
> > > Which driver is active? Which Infinipath software release
> > > is installed? The tool "ipath_control -i" can show which...
> > QLogic kernel.org driver
> > 00: Version: Driver 2.0, InfiniPath_QLE7140, InfiniPath1 4.2, PCI 2, SW Compat 2
> > I think this is a 2.1 distribution, whereas there's at least 2.2 now
> > available.
> > > The kernel.org/ofed driver does not have as rich a set of error recovery
> > > code for this card as the shipped driver. The recovery code was seen
> > > as a badness and not accepted by the kernel.org folk....
> > Hmm...
> > > With a kernel update the driver will not have been recompiled
> > > and the kernel.org driver would become active.
> > [Actually it wasn't just a kernel update -- the SuSE 9.3 system disk was
> > removed and replaced by a 10.3 one shortly after I arrived, trashing all
> > the configuration, so I'm a little at sea, without infiniband
> > experience.]
> Have you tried searching for Infinipath drivers at the SUSE 10.3
> repositories? If you're using OpenSUSE rather than SLED / SLES,
> perhaps it would be worth checking the community build repository too.
> Maybe someone has already done the build work for you. I'm
> continually amazed at the useful stuff I find there that I was certain
> I'd have to build for myself.
> For that matter, a clean install may be in order as a last resort.
> Good luck!
Yes you have the OFED/kernel.org driver.
For this card do pull and load your drivers from the QLogic support
download area! For this card the latest and greatest will be
on the QLogic site. Also, pickup the documentation at the same time.....
The driver is built from source on the system for the active kernel.
And yes recall the OFED/kernel.org driver is missing the nonstop
recovery code for parity errors that have been observed on this card.
The kernel.org driver for this card detects the error and stops rather
than risk passing a data error.
T o m M i t c h e l l
Got a great hat... now what.
More information about the Beowulf