IDE disk errors
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
J. G. LaBounty jgl at unix.shell.comWed Jun 13 11:04:42 PDT 2001
- Previous message: PBS
- Next message: IDE disk errors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks for your input. We just this morning booted our 50 node Supermicro cluster with the noapic option. I will post to the group if it solves our problem. > From: "Michael T. Prinkey" <mprinkey at aeolusresearch.com> > > Hi John, > > I have encountered similar problems. I solved them by building the > kernel without APIC, or by running the kernel with the noapic option. > > Regards, > > Mike Prinkey > Aeolus Research, Inc. > > "J. G. LaBounty" wrote: > > > > > > We are being swamped with disk errors. Most of the errors are logged > > as follows: > > > > Jun 12 01:44:40 scf402n kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > > Jun 12 01:44:40 scf402n kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7975408, sector=2625696 > > Jun 12 01:44:40 scf402n kernel: end_request: I/O error, dev 03:08 (hda), sector 2625696 > > > > Everything that I can find says this is a media problem. Our typical recovery > > procedure is to: > > > > 1. run e2fsck -c -v -y /dev/hdX > > We will run this procedure following a disk error but eventually the > > system will hang or we get so many errors, it will take too long to > > complete (over 2 hours, with no errors it takes about 45 minutes). > > 2. If #1 fails, we will run the IBM DFT utility to reformat the drive. After > > reformating we have run e2fsck -c and it finds no errors. If reformat > > fails, we return the drive for replacement. > > > > Configuration: > > Number Motherboard CPU DISK per node AGE # Failures > > 34 nodes on ASUS P2BD 2-600MHz cpus 2 Western Digital 26gb drives 18 months 6 > > 50 nodes on ASUS P2BD 2-800MHz cpus 2 IBM deskstar 30 gb drives 8 months 21 > > 150 nodes on Tyan 2500 2-800MHz cpus 2 IBM deskstar 45 gb drives 6 months 104 > > Disks are attached to a Promise 100 card > > 50 nodes on Supermicro 370DLE 2-1GHz cpus 2 IBM deskstar 60 gb drives 2 months 28 > > > > All nodes are running Redhat 6.2 with a 2.2.16 kernel. DMA is turned on in the > > kernel plus the Promise 100 patch is installed. > > > > For some reason most of our failures have been on the root disk. We have > > tried running with root and swap on 1 disk and application scratch space on the > > second disk. While this seems to reduce the frequency of the error, it does > > not eliminate it. > > > > We are also dropping the transfer rate of the device back to a slower speed. We > > are using DMA mode. As a last resort, we may try PIO mode but really don't > > want to take that performance hit. > > > > This may seem like a lot of work for drives under warranty but IBM no longer makes > > the 45 gb drive. Warranty returns are taking several weeks to get the replacements. > > We have found that the replacements are not any better than the drives that > > can be reformated. > > > > We have looked at moving to SCSI drives of similar size but don't want to take the > > price hit. Adding 2 - scsi drives and a controller would bump our base price > > 30 - 50%. > > > > Has anyone else experienced similar problems? Any suggestions as what we could > > try to alleviate the problem? > > > > > > John > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf John
- Previous message: PBS
- Next message: IDE disk errors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
