[Beowulf] RAID question
Jörg Saßmannshausen
j.sassmannshausen at ucl.ac.uk
Tue Mar 17 14:57:43 PDT 2015
Hi David,
for me it looks like either a controller or disc issue.
I have seen these problems before on SCSI discs when the controller had a
problem. Depending on the manufacturer it might be a good idea to contact them
and see if they got more informations here. I have had some problems in the
past with RAID controllers and the manufacturer here was ever so helpful in
the diagnosis and repair of a failed RAID5 for example.
So it might be a good idea to try them.
All the best from a cold London
Jörg
On Montag 16 März 2015 mathog wrote:
> Thanks for the feedback.
>
> After copying /boot and /bin from another machine and mucking about with
> grub for far too long (had to edit grub.conf to change virtual disk
> names, and in CentOS's rescue disk it saw the boot disk as hd1, but when
> grub actually started, it saw it as hd0) the system is back on line.
>
> The logs don't show a root command line that specifically took out those
> directories. They do show a bunch of scripts being run. My best guess
> is that one of them did something like this:
>
> AVAR=`command that failed and returned an empty string`
> rm -rf ${AVAR}/b*
>
> It seems unlikely that a low level controller failure would have snipped
> out those files/directories without resulting in a file system that was
> seen as corrupt by fsck.
>
> That said, there is something hardware related going on, since
> /var/log/messages has a lot of these (sorry about the wrap):
>
> Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb] Sense Key :
> Recovered Error [current] [descriptor]
> Mar 16 12:37:27 mandolin kernel: Descriptor sense data with sense
> descriptors (in hex):
> Mar 16 12:37:27 mandolin kernel: 72 01 04 1d 00 00 00 0e 09 0c 00
> 00 00 00 00 00
> Mar 16 12:37:27 mandolin kernel: 00 4f 00 c2 40 50
> Mar 16 12:37:27 mandolin kernel: sd 7:0:0:0: [sdb] ASC=0x4 ASCQ=0x1d
>
> That group has several other similar Dell servers, and this is the only
> one logging these. sdb1 holds /boot and sdb2 is where the lvm keeps its
> information.
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
--
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ
email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20150317/547fdee6/attachment.sig>
More information about the Beowulf
mailing list