Unexplained I/O errors
Steven Timm
timm at fnal.gov
Tue Jul 17 11:37:12 PDT 2001
Hi Donald.. thanks for your reply. Actually we are not running
a 2.4 kernel. We are running 2.2.19. Would this have any effect,
and is there likely to be any change between 2.2.16 and 2.2.19
that would affect ultraDMA/IDE performance?
Steve
------------------------------------------------------------------
Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations
On Tue, 17 Jul 2001, Donald Becker wrote:
> On Tue, 17 Jul 2001, Steven Timm wrote:
>
> > We are currently burning in a new cluster and seeing the following
> > problem:
> >
> > We see a number of files, usually contiguous in the same directory,
> > that ls will list as being there, but ls -l will show Input/output error.
> > An fsck of the system gets rid of the I/O errors but also gets
> > rid of the file. There is no error message on the console, nor
> > in /var/log/messages, to indicate any disk controller problems.
>
> I'm guessing that you are running a 2.4 kernel.
> There are a collection of related bugs in the 2.4 kernel IDE and VM
> systems. Note that the 'ac' series (ac==Alan Cox) VM subsystem is
> substantially different than Linus' kernel in an attempt to track this
> down.
>
> > The problem appears to get worse over time, over a period of a few
> > days the majority of our 136 machines exhibit these errors.
>
> One aspect of running clusters is that any kernel problem is
> dramatically magnified. We frequently get questions about switching to
> a 2.4 kernel, but it's rarely from people with medium or large clusters.
>
> > We have downgraded a few machines to the 2.2.16 kernel, and this
> > appears to be OK, but it is a bit early to tell.
>
> We are staying with the 2.2 kernel for now.
> The 2.4.6 kernel looks pretty good, but it's still too early to tell.
>
> Donald Becker becker at scyld.com
> Scyld Computing Corporation http://www.scyld.com
> 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
> Annapolis MD 21403 410-990-9993
>
>
More information about the Beowulf
mailing list