[Beowulf] RAID question

Brendan Moloney moloney.brendan at gmail.com
Tue Mar 17 17:34:16 PDT 2015


Dell controllers are (almost?) always rebranded LSI.  You can use the LSI
tool "megacli", or if you prefer a slightly less insane UI you can use the
newer "storcli". To see what happened during your consistency check you
would need to check the logs that can be generated with these tools.  These
logs will probably also help you determine what went wrong initially.
Unfortunately, these RAID controllers do not allow you to access any SMART
data unless they are in JBOD mode so you pretty much have to pull the disk
and check it on another machine.

Last I looked, the default behavior for a consistency check is to assume
the primary copy is correct for any discrepancies and overwrite the
parity/mirror versions with that.  Which is pretty dumb for a default
setting (you can at least disable this automatic "repair").  So your
consistency check might have wiped the good data...



On Tue, Mar 17, 2015 at 2:57 PM, Jörg Saßmannshausen <
j.sassmannshausen at ucl.ac.uk> wrote:

> Hi David,
>
> for me it looks like either a controller or disc issue.
>
> I have seen these problems before on SCSI discs when the controller had a
> problem. Depending on the manufacturer it might be a good idea to contact
> them
> and see if they got more informations here. I have had some problems in the
> past with RAID controllers and the manufacturer here was ever so helpful in
> the diagnosis and repair of a failed RAID5 for example.
>
> So it might be a good idea to try them.
>
> All the best from a cold London
>
> Jörg
>
>
> On Montag 16 März 2015 mathog wrote:
> > Thanks for the feedback.
> >
> > After copying /boot and /bin from another machine and mucking about with
> > grub for far too long (had to edit grub.conf to change virtual disk
> > names, and in CentOS's rescue disk it saw the boot disk as hd1, but when
> > grub actually started, it saw it as hd0) the system is back on line.
> >
> > The logs don't show a root command line that specifically took out those
> > directories.  They do show a bunch of scripts being run.  My best guess
> > is that one of them did something like this:
> >
> >    AVAR=`command that failed and returned an empty string`
> >    rm -rf ${AVAR}/b*
> >
> > It seems unlikely that a low level controller failure would have snipped
> > out those files/directories without resulting in a file system that was
> > seen as corrupt by fsck.
> >
> > That said, there is something hardware related going on, since
> > /var/log/messages has a lot of these (sorry about the wrap):
> >
> > Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb]  Sense Key :
> > Recovered Error [current] [descriptor]
> > Mar 16 12:37:27 mandolin kernel: Descriptor sense data with sense
> > descriptors (in hex):
> > Mar 16 12:37:27 mandolin kernel:        72 01 04 1d 00 00 00 0e 09 0c 00
> > 00 00 00 00 00
> > Mar 16 12:37:27 mandolin kernel:        00 4f 00 c2 40 50
> > Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb]  ASC=0x4 ASCQ=0x1d
> >
> > That group has several other similar Dell servers, and this is the only
> > one logging these.  sdb1 holds /boot and sdb2 is where the lvm keeps its
> > information.
> >
> > Regards,
> >
> > David Mathog
> > mathog at caltech.edu
> > Manager, Sequence Analysis Facility, Biology Division, Caltech
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
>
> --
> *************************************************************
> Dr. Jörg Saßmannshausen, MRSC
> University College London
> Department of Chemistry
> Gordon Street
> London
> WC1H 0AJ
>
> email: j.sassmannshausen at ucl.ac.uk
> web: http://sassy.formativ.net
>
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150317/cef79587/attachment.html>


More information about the Beowulf mailing list