[Beowulf] RAID5 rebuild, remount with write without reboot?

Tue Sep 5 10:52:30 PDT 2017

Without a power cycle updating the drive firmware would be the only method
of tricking the drives into a power-cycle. Obviously very risky. A reboot
should be low risk.

On Tue, Sep 5, 2017 at 12:28 PM, mathog <mathog at caltech.edu> wrote:

> Short form:
>
> An 8 disk (all 2Tb SATA) RAID5 on an LSI MR-USAS2 SuperMicro controller
> (lspci shows " LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon]")
> system was long ago configured with a small partition of one disk as /boot
> and logical volumes for / (root) and /home on a single large virual drive
> on the RAID.  Due to disk problems and a self goal (see below) the array
> went into a degraded=1 state (as reported by megacli) and write locked both
> root and home.  When the failed disk was replaced and the rebuild completed
> those were both still write locked.  "mount -a" didn't help in either
> case.  A reboot brought them up normally but ideally that should not have
> been necessary.  Is there a method to remount the logical volumes writable
> that does not require a reboot?
>
> Long form:
>
> Periodic testing of the disks inside this array turned up pending sectors
> with
> this command:
>
>    smartctl -a  /dev/sda -d sat+megaraid,7
>
> A replacement disk was obtained and the usual replacement method applied:
>
> megacli -pdoffline -physdrv[64:7] -a0
> megacli -pdmarkmissing -physdrv[64:7] -a0
> megacli -pdprprmv -physdrv[64:7] -a0
> megacli -pdlocate -start -physdrv[64:7] -a0
>
> The disk with the flashing light was physically swapped.  The smartctl was
> run again and unfortunately its values were unchanged.  I had always
> assumed that the "7" in that smartctl was a physical slot, turns out that
> it is actually the "Device ID".  In my defense the smartctl man page does a
> very poor job describing this:
>
>   megaraid,N - [Linux only] the device consists of one or more SCSI/SAS
> disks
>   connected to  a  MegaRAID controller.   The  non-negative  integer N (in
>   the range of 0 to 127 inclusive) denotes which disk on the controller
>   is monitored.  Use syntax such as:
>
> In this system, unlike the others I had worked on previously, Device ID and
> slots were not 1:1.
>
> Anyway, about a nanosecond after this was discovered the disk at Device ID
> 7 was marked as Failed by the controller whereas previously it had been
> "Online, Spun Up".
> Ugh. At that point the logical volumes were all set read only and the OS
> became barely usable, with commands like "more" no longer functioning.
> Megacli and sshd, thankfully, still worked.  Figuring that I had nothing to
> lose the replacement disk was removed from slot 7 and the original,
> hopefully still good disk replaced.  That put the system into this state.
>
> slot 4 (device ID 7) failed.
> slot 7 (device ID 5) is Offline.
>
> and
>
> megacli -PDOnline -physdrv[64:7] -a0
>
> put it at
>
> slot 4 (device ID 7) failed.
> slot 7 (device ID 5) Online, Spun Up
>
> The logical volumes were still read only but "more" and most other
> commands now worked again.  Megacli still showed the "degraded" value as
> 1.  I'm still not clear
> how the two "read only" states differed to cause this change.
>
> At that point the failed disk in slot 4 (not 7!) was replaced with the
> new disk (which had been briefly in slot 7) and it immediately began to
> rebuild.  Something on the order of 48 hours later that rebuild completed,
> and the controller set "degraded" back to 0.  However, the logical volumes
> were still readonly.  "mount -a" didn't fix it, so the system was rebooted,
> which worked.
>
>
> We have two of these back up systems.  They are supposed to have identical
> contents but do not.  Fixing that is another item on a long todo list.
> RAID 6 would have been a better choice for this much storage, but it does
> not look like this card supports it:
>
>   RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, PRL 11, PRL 11 with
> spanning,
>   SRL 3 supported, PRL11-RLQ0 DDF layout with no span,
>   PRL11-RLQ0 DDF layout with span
>
> That rebuild is far too long for comfort.  Had another disk failed in
> those two days that would have been it. Neither controller has battery
> backup, and the one in question is not even on a UPS, so a power glitch
> could be fatal too. Not a happy thought while record SoCal temperatures
> persisted throughout the entire rebuild! The systems are in different
> buildings on the same campus, sharing the same power grid.  There are no
> other backups for most of this data.
>
> Even though the controller shows this system as no longer degraded, should
> I believe that there was no data loss?  I can run checksums on all the
> files (even though it will take forever) and compare the two systems.  But
> as I said previously, the files were not entirely 1:1, so there are
> certainly going to be some files on this system which have no match on the
> other.
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
- Andrew "lathama" Latham lathama at gmail.com http://lathama.com
<http://lathama.org> -
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20170905/6cf3639c/attachment.html>