[Beowulf] GPFS and failed metadata NSD

Peter St. John peter.st.john at gmail.com
Sat Apr 29 07:36:21 PDT 2017


just a friendly reminder that while the probability of a particular
coincidence might be very low, the probability that there will be **some**
coincidence is very high.

Peter (pedant)

On Sat, Apr 29, 2017 at 3:00 AM, John Hanks <griznog at gmail.com> wrote:

> Hi,
>
> I'm not getting much useful vendor information so I thought I'd ask here
> in the hopes that a GPFS expert can offer some advice. We have a GPFS
> system which has the following disk config:
>
> [root at grsnas01 ~]# mmlsdisk grsnas_data
> disk         driver   sector     failure holds    holds
>          storage
> name         type       size       group metadata data  status
>  availability pool
> ------------ -------- ------ ----------- -------- ----- -------------
> ------------ ------------
> SAS_NSD_00   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_01   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_02   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_03   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_04   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_05   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_06   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_07   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_08   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_09   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_10   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_11   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_12   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_13   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_14   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_15   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_16   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_17   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_18   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_19   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_20   nsd         512         100 No       Yes   ready         up
>         system
> SAS_NSD_21   nsd         512         100 No       Yes   ready         up
>         system
> SSD_NSD_23   nsd         512         200 Yes      No    ready         up
>         system
> SSD_NSD_24   nsd         512         200 Yes      No    ready         up
>         system
> SSD_NSD_25   nsd         512         200 Yes      No    to be emptied down
>         system
> SSD_NSD_26   nsd         512         200 Yes      No    ready         up
>         system
>
> SSD_NSD_25 is a mirror in which both drives have failed due to a series of
> unfortunate events and will not be coming back. From the GPFS
> troubleshooting guide it appears that my only alternative is to run
>
> mmdeldisk grsnas_data  SSD_NSD_25 -p
>
> around which the documentation also warns is irreversible, the sky is
> likely to fall, dogs and cats sleeping together, etc. But at this point I'm
> already in an irreversible situation. Of course this is a scratch
> filesystem, of course people were warned repeatedly about the risk of using
> a scratch filesystem that is not backed up and of course many ignored that.
> I'd like to recover as much as possible here. Can anyone confirm/reject
> that deleting this disk is the best way forward or if there are other
> alternatives to recovering data from GPFS in this situation?
>
> Any input is appreciated. Adding salt to the wound is that until a few
> months ago I had a complete copy of this filesystem that I had made onto
> some new storage as a burn-in test but then removed as that storage was
> consumed... As they say, sometimes you eat the bear, and sometimes, well,
> the bear eats you.
>
> Thanks,
>
> jbh
>
> (Naively calculated probability of these two disks failing close together
> in this array: 0.00001758. I never get this lucky when buying lottery
> tickets.)
> --
> ‘[A] talent for following the ways of yesterday, is not sufficient to
> improve the world of today.’
>  - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20170429/e5ac129d/attachment-0001.html>


More information about the Beowulf mailing list