Curious failure...

Doug Ledford dledford@redhat.com
Fri Dec 3 16:05:25 1999


"Robert G. Brown" wrote:
> 
> On Fri, 3 Dec 1999, Mike Isely wrote:
> 
> > If the aic7xxx driver maps the adapter into memory space (I think it
> > does), then it could have mapped over what it thought was empty space, but
> > because there was real memory there (that went undetected), Bad Things
> > might have resulted...  The same scenario would be possible for any memory
> > mapped device (thinking about the video card here).
> 
> That does indeed sound plausible and corresponds to my intuition that 64
> MB OUGHT to be enough to boot a stripped kernel, on the face of things.
> If this is correct, however, it is a pretty serious bug unless/until
> kernels are able to "guarantee" that they can identify the actual memory
> in a system without exceptions.
> 
> The one curious thing, though, is that the bug bit only for the SMP
> kernel; the UP kernel (which still wasn't getting the memory right)
> successfully loaded the UP adaptec module in initrd on the RH
> boot/install floppy or from the disk.  The network failed to work
> (sometimes) even for the UP kernel, but I suspect that this is a
> separate issue.
> 
> I do have a very nice test system if Doug Ledford or anyone else wants
> to suggest a way of finding out if this is the problem and/or finding a
> fix for it.  I already tried (repeatedly!) to use the debug options
> built into the aic7xxx driver but got very little from them -- the
> driver simply goes belly up with a reset bus loop -- waiting for device
> 0.  Needless to say, I played extensively with options as described in
> README.aic7xxx and with the reset delay to no avail.  Again, the system
> ran for literally years with 2.0.3x and various aic7xxx driver revisions
> including the very latest one and runs just fine now with a 2.2.12 UP
> kernel (or 2.2.13 UP) or 2.1.9x SMP (last 2.1 kernel I tried) -- this is
> a fairly recent and apparently SMP-specific problem.

Boot the linux kernel with the option "noapic" and everything should be fine. 
What you describe is the typical condition when the IO-APIC code in the SMP
kernel gets the interrupt mapping wrong.

-- 
  Doug Ledford   <dledford@redhat.com>
   Opinions expressed are my own, but
      they should be everybody's.