[Beowulf] RAID question

Jörg Saßmannshausen j.sassmannshausen at ucl.ac.uk
Wed Mar 18 08:46:43 PDT 2015


Hi all,

you can access the disc if it is hooked up to (some) LSI controllers:

Try:

$ smartctl -i /dev/sda -d megaraid,X

You need to play around a bit with X as that is the port the controller is 
using. As you can see, I am using a megaraid controller card and that is 
working well here.

As always: your mileage may vary.

All the best from a sunny London

Jörg

On Wednesday 18 Mar 2015 00:34:16 you wrote:
> Dell controllers are (almost?) always rebranded LSI.  You can use the LSI
> tool "megacli", or if you prefer a slightly less insane UI you can use the
> newer "storcli". To see what happened during your consistency check you
> would need to check the logs that can be generated with these tools.  These
> logs will probably also help you determine what went wrong initially.
> Unfortunately, these RAID controllers do not allow you to access any SMART
> data unless they are in JBOD mode so you pretty much have to pull the disk
> and check it on another machine.
> 
> Last I looked, the default behavior for a consistency check is to assume
> the primary copy is correct for any discrepancies and overwrite the
> parity/mirror versions with that.  Which is pretty dumb for a default
> setting (you can at least disable this automatic "repair").  So your
> consistency check might have wiped the good data...
> 
> 
> 
> On Tue, Mar 17, 2015 at 2:57 PM, Jörg Saßmannshausen <
> 
> j.sassmannshausen at ucl.ac.uk> wrote:
> > Hi David,
> > 
> > for me it looks like either a controller or disc issue.
> > 
> > I have seen these problems before on SCSI discs when the controller had a
> > problem. Depending on the manufacturer it might be a good idea to contact
> > them
> > and see if they got more informations here. I have had some problems in
> > the past with RAID controllers and the manufacturer here was ever so
> > helpful in the diagnosis and repair of a failed RAID5 for example.
> > 
> > So it might be a good idea to try them.
> > 
> > All the best from a cold London
> > 
> > Jörg
> > 
> > On Montag 16 März 2015 mathog wrote:
> > > Thanks for the feedback.
> > > 
> > > After copying /boot and /bin from another machine and mucking about
> > > with grub for far too long (had to edit grub.conf to change virtual
> > > disk names, and in CentOS's rescue disk it saw the boot disk as hd1,
> > > but when grub actually started, it saw it as hd0) the system is back
> > > on line.
> > > 
> > > The logs don't show a root command line that specifically took out
> > > those directories.  They do show a bunch of scripts being run.  My
> > > best guess
> > > 
> > > is that one of them did something like this:
> > >    AVAR=`command that failed and returned an empty string`
> > >    rm -rf ${AVAR}/b*
> > > 
> > > It seems unlikely that a low level controller failure would have
> > > snipped out those files/directories without resulting in a file system
> > > that was seen as corrupt by fsck.
> > > 
> > > That said, there is something hardware related going on, since
> > > /var/log/messages has a lot of these (sorry about the wrap):
> > > 
> > > Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb]  Sense Key :
> > > Recovered Error [current] [descriptor]
> > > Mar 16 12:37:27 mandolin kernel: Descriptor sense data with sense
> > > descriptors (in hex):
> > > Mar 16 12:37:27 mandolin kernel:        72 01 04 1d 00 00 00 0e 09 0c
> > > 00 00 00 00 00 00
> > > Mar 16 12:37:27 mandolin kernel:        00 4f 00 c2 40 50
> > > Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb]  ASC=0x4 ASCQ=0x1d
> > > 
> > > That group has several other similar Dell servers, and this is the only
> > > one logging these.  sdb1 holds /boot and sdb2 is where the lvm keeps
> > > its information.
> > > 
> > > Regards,
> > > 
> > > David Mathog
> > > mathog at caltech.edu
> > > Manager, Sequence Analysis Facility, Biology Division, Caltech
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> > > Computing To change your subscription (digest mode or unsubscribe)
> > > visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> > --
> > *************************************************************
> > Dr. Jörg Saßmannshausen, MRSC
> > University College London
> > Department of Chemistry
> > Gordon Street
> > London
> > WC1H 0AJ
> > 
> > email: j.sassmannshausen at ucl.ac.uk
> > web: http://sassy.formativ.net
> > 
> > Please avoid sending me Word or PowerPoint attachments.
> > See http://www.gnu.org/philosophy/no-word-attachments.html
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf

-- 
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ 

email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150318/13d078bb/attachment.sig>


More information about the Beowulf mailing list