[Beowulf] Troubleshooting NFS stale file handles

Thu Apr 20 02:17:45 PDT 2017

Tim
That reminds me of the issue I found with shared IPMI interfaces - the reserved IPMI port clashing with the sunrpc.min_resvport  (or more exactly the range of Sun RPC ports overlapping with IPMI)
That was a long time ago, and the min_resvport has been increased in modern kernels as far as I know.
https://www.pantz.org/hardware/ipmi/ipmiconflictwithrpc.html

Actually, have a look at this page  https://www.novell.com/support/kb/doc.php?id=7007308
The parameter sunrpc.max_shared  may be relevant to this issue? Then again this is old advice also (see below)
Just looked on a CentOS 7 system with a 3.10.0-514.10.2.el7.x86_64 kernel, and that parameter is not even present any more.

On this system the sunrpc entries not referring to rdma are as below.
Prentice, maybe you are coming up against one of these limits?  Though the limits look quite generous.

sunrpc.max_resvport = 1023
sunrpc.min_resvport = 665
sunrpc.nfs_debug = 0x0000
sunrpc.nfsd_debug = 0x0000
sunrpc.nlm_debug = 0x0000
sunrpc.rpc_debug = 0x0000
sunrpc.tcp_fin_timeout = 15
sunrpc.tcp_max_slot_table_entries = 65536
sunrpc.tcp_slot_table_entries = 2
sunrpc.transports = tcp 1048576
sunrpc.transports = udp 32768
sunrpc.transports = tcp-bc 1048576
sunrpc.udp_slot_table_entries = 16

If many of the mounts point to the same NFS server, it may also help to allow one connection to an NFS server to be shared for several of the mounts.  This is automatic on SLES 11 SP1 and above, but on SLES 10 it is configured with the command:
sysctl -w sunrpc.max_shared=10

-----Original Message-----
From: Beowulf [mailto:beowulf-bounces at beowulf.org] On Behalf Of Tim Cutts
Sent: 20 April 2017 10:04
To: Prentice Bisbal <pbisbal at pppl.gov>
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] Troubleshooting NFS stale file handles

I've seen, in the past, problems with fragmented packets being misinterpreted, resulting in stale NFS symptoms. In that case it was an Intel STL motherboard (we're talking 20 years ago here), which shared a NIC for management as well as the main interface.  The fragmented packets got inappropriately intercepted by the management processor and never reached Linux.  That took ages to nail down.

One question I was going to ask - which automounter are you using?  autofs or am-utils?

Tim

Sent from my iPhone

> On 19 Apr 2017, at 7:11 pm, Prentice Bisbal <pbisbal at pppl.gov> wrote:
>
> Ellis,
>
> Thanks for the suggestion(s). Just this morning I started considering the network as a possible source of error. My stale file handle errors are easily fixed by just restarting the nfs servers with 'service nfs restart', so they aren't as severe you describe.
>
> Prentice
>
>> On 04/19/2017 02:03 PM, Ellis H. Wilson III wrote:
>> Here are a couple conditions to look for that I've seen stale NFS file handles caused by.  These are rather high-level to just get you started.  Sorry, short on time today:
>>
>> 1. Are you sure your NFS server isn't getting swamped by the jobs such that it drops packets back to the clients?  Completely overwhelming an NFS server for sufficient lengths of time might cause this, though it's rare.
>>
>> 2. Are you sure that your clients (and the NFS server itself) has a solid network connection?  Frequent network hiccups can trigger stale NFS file handles that occasionally require a hard reboot for me.  This is the more common case I see.
>>
>> Both of these essentially relate to the same thing, which is the connection between the NFS server and clients becoming stalled for too long a time at some point.  In theory NFS should deal with this gracefully, but there are corner-cases (that ironically get hit more often than I feel like they should) where it gets stuck in a way that's rather sticky and tends to require reboot.
>>
>> Best,
>>
>> ellis
>>
>>> On 04/19/2017 01:58 PM, Prentice Bisbal wrote:
>>> Here's the sequence of events:
>>>
>>> 1. First job(s) run fine on the node and complete without error.
>>>
>>> 2. Eventually a job fails with a 'permission denied' error when it
>>> tries to access /l/hostname.
>>>
>>> Since no jobs fail with a file I/O error, it's hard to confirm that
>>> the jobs themselves are causing the problem. However, if these
>>> particular jobs are the only thing running on the cluster and should
>>> be the only jobs accessing these NFS shares, what else could be causing them.
>>>
>>> All these systems are getting their user information from LDAP.
>>> Since some jobs run before these errors appear, lack of, or
>>> inaccurate user info doesn't seem to be a likely source of this
>>> problem, but I'm not ruling anything out at this point.
>>>
>>> Important detail: This is NFSv3.
>>>
>>> Prentice Bisbal
>>> Lead Software Engineer
>>> Princeton Plasma Physics Laboratory
>>> http://www.pppl.gov
>>>
>>>> On 04/19/2017 12:20 PM, Ryan Novosielski wrote:
>>>> Are you saying they can't mount the filesystem, or they can't write
>>>> to a mounted filesystem? Where does this system get its user
>>>> information from, if the latter?
>>>>
>>>> --
>>>> ____
>>>> || \\UTGERS,
>>>> |---------------------------*O*---------------------------
>>>> ||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu
>>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
>>>> ||RBHS
>>>> Campus
>>>> ||  \\    of NJ     | Office of Advanced Research Computing - MSB
>>>> C630, Newark
>>>>      `'
>>>>
>>>>> On Apr 19, 2017, at 12:09, Prentice Bisbal <pbisbal at pppl.gov> wrote:
>>>>>
>>>>> Beowulfers,
>>>>>
>>>>> I've been trying to troubleshoot a problem for the past two weeks
>>>>> with no luck. We have a cluster here that runs only one
>>>>> application (although the details of that application change
>>>>> significantly from run-to-run.). Each node in the cluster has an
>>>>> NFS export, /local, that can be automounted by every other node in
>>>>> the cluster as /l/hostname.
>>>>>
>>>>> Starting about two weeks ago, when jobs would try to access
>>>>> /l/hostname, they would get permission denied messages. I tried
>>>>> analyzing this problem by turning on all NFS/RPC logging with
>>>>> rpcdebug and also using tcpdump while trying to manually mount one
>>>>> of the remote systems. Both approaches indicated state file
>>>>> handles were prevent the share from being mounted.
>>>>>
>>>>> Since it has been 6-8 weeks since there were any seemingly
>>>>> relevant system config changes, I suspect it's an application
>>>>> problem (naturally). On the other hand, the application
>>>>> developers/users insist that they haven't made any changes, to
>>>>> their code, either. To be honest, there's no significant evidence
>>>>> indicating either is at fault. Any suggestions on how to debug
>>>>> this and definitively find the root cause of these stale file handles?
>>>>>
>>>>> --
>>>>> Prentice
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>>> Computing To change your subscription (digest mode or unsubscribe)
>>>>> visit http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> Computing To change your subscription (digest mode or unsubscribe)
>>> visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing To change your subscription (digest mode or unsubscribe)
> visit http://www.beowulf.org/mailman/listinfo/beowulf

--
 The Wellcome Trust Sanger Institute is operated by Genome Research  Limited, a charity registered in England with number 1021457 and a  company registered in England with number 2742969, whose registered  office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Employees of XMA Ltd are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to company policy and outside the scope of the employment of the individual concerned. The company will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising. XMA Limited is registered in England and Wales (registered no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane, Wilford, Nottingham, NG11 7EP