Disk reliability (Was: Node cloning)
Josip Loncaric
josip at icase.edu
Tue May 22 08:10:06 PDT 2001
Mark Hahn wrote:
>
> > as discussed in previous emails on the list. I followed the pointers
> > that Josip gave and ran the IBM code on the drive. It said the drive
>
> the code in question probably just configured the drive
> to default to udma33 or something modest. this shouldn't ever
> be necessary, since the bios shouldn't misconfigure a too-high
> speed, and any modern Linux will not. (though you can choose your
> own mode using hdparm, if you wish.)
IBM's Drive Fitness Test (DFT) actually does a lot. It accesses IBM
hard drive microcode to enable diagnosis of hard drive operation, and
when necessary, it can remap new bad blocks and zero the disk. For more
detail, see the DFT white paper
http://www.storage.ibm.com/hardsoft/diskdrdl/technolo/dft/dft.htm
FYI, we applied this procedure to two IBM hard drives which had
developed too many bad blocks (155 and 84, respectively) and we have not
seen any bad blocks since then (for over a month). Since IBM's DFT
program accesses special IBM hard drive DFT microcode to learn about low
level performance details, I am not sure if it can do much for non-IBM
drives.
JackM wrote:
>
> You can try using hdparm to turn the DMA off. Of course, it does slow
> down data transfer rates considerably.
As Mark said, BadCRC only means that the transfer was retried. If a few
BadCRC messages are the only problem, I would not turn off DMA.
BTW, some early UltraDMA drives have known problems (e.g.
http://www.seagate.com/support/kb/disc/bigbear.html) and if you have a
drive like that, turning off DMA is advisable.
Sincerely,
Josip
--
Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu
ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
More information about the Beowulf
mailing list