Curious failure...
Doug Ledford
dledford@redhat.com
Fri Dec 3 16:05:25 1999
"Robert G. Brown" wrote:
>
> On Fri, 3 Dec 1999, Mike Isely wrote:
>
> > If the aic7xxx driver maps the adapter into memory space (I think it
> > does), then it could have mapped over what it thought was empty space, but
> > because there was real memory there (that went undetected), Bad Things
> > might have resulted... The same scenario would be possible for any memory
> > mapped device (thinking about the video card here).
>
> That does indeed sound plausible and corresponds to my intuition that 64
> MB OUGHT to be enough to boot a stripped kernel, on the face of things.
> If this is correct, however, it is a pretty serious bug unless/until
> kernels are able to "guarantee" that they can identify the actual memory
> in a system without exceptions.
>
> The one curious thing, though, is that the bug bit only for the SMP
> kernel; the UP kernel (which still wasn't getting the memory right)
> successfully loaded the UP adaptec module in initrd on the RH
> boot/install floppy or from the disk. The network failed to work
> (sometimes) even for the UP kernel, but I suspect that this is a
> separate issue.
>
> I do have a very nice test system if Doug Ledford or anyone else wants
> to suggest a way of finding out if this is the problem and/or finding a
> fix for it. I already tried (repeatedly!) to use the debug options
> built into the aic7xxx driver but got very little from them -- the
> driver simply goes belly up with a reset bus loop -- waiting for device
> 0. Needless to say, I played extensively with options as described in
> README.aic7xxx and with the reset delay to no avail. Again, the system
> ran for literally years with 2.0.3x and various aic7xxx driver revisions
> including the very latest one and runs just fine now with a 2.2.12 UP
> kernel (or 2.2.13 UP) or 2.1.9x SMP (last 2.1 kernel I tried) -- this is
> a fairly recent and apparently SMP-specific problem.
Boot the linux kernel with the option "noapic" and everything should be fine.
What you describe is the typical condition when the IO-APIC code in the SMP
kernel gets the interrupt mapping wrong.
--
Doug Ledford <dledford@redhat.com>
Opinions expressed are my own, but
they should be everybody's.