Disk reliability (Was: Node cloning)

Jeffrey B Layton jeffrey.b.layton at lmco.com
Mon May 14 05:18:12 PDT 2001


Hello,

  I hate to dredge up this topic again, but ... . I've got a machine
with an IBM drive that is giving me the following errors,

kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

as discussed in previous emails on the list. I followed the pointers
that Josip gave and ran the IBM code on the drive. It said the drive
was fine. However, I'm still getting the same error messages.
Anybody care to suggest anything else to look at? Perhaps cabling
or a new motherboard (it's an Abit board).

TIA,

Jeff



Josip Loncaric wrote:

> Thanks to several constructive responses, the following picture emerges:
>
> (1) Modern IDE drives can automatically remap a certain number of bad
> blocks.  While they are doing this correctly, the OS should not even see
> a bad block.
>
> (2) However, the drive's capacity to do this is limited to 256 bad
> blocks or so.  If more bad blocks exist, then the OS will start to see
> them.  To recover from this without replacing the hard drive, one can
> detect and map out the bad blocks using 'e2fsck -c ...' and 'mkswap -c
> ...' commands.  Obviously, the partition where this is being done should
> not be in use (turn swap off first, unmount the file system or reboot
> after doing "echo '-f -c' >/fsckoptions").
>
> (3) In general, IDE cables should be at most 18" long with both ends
> plugged in (no stubs), and preferably serving only one (master) drive.
>
> For IBM drives (IDE or SCSI), one can download and use the Drive Fitness
> Test utility (see
> http://www.storage.ibm.com/techsup/hddtech/welcome.htm).  This program
> can diagnose typical problems with hard drives.  In many cases, bad
> blocks can be 'healed' by erasing the drive using this utility (back up
> your data first, and be prepared for the 'Erase Disk' to take an hour or
> more).  If that fails and your drive is under warranty, the drive ought
> to be replaced.
>
> For older existing drives (in less critical applications, e.g. to boot
> Beowulf client nodes where the same data is mirrored by other nodes)
> mapping out bad blocks as needed is probably adequate.
>
> Finally, the existing Linux S.M.A.R.T. utilities apparently do not
> handle every SMART drive correctly.  Use with caution.
>
> Sincerely,
> Josip
>
> --
> Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
> ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
> NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
> Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf






More information about the Beowulf mailing list