[Beowulf] Troubleshooting NFS stale file handles
Prentice Bisbal
pbisbal at pppl.gov
Mon Apr 24 13:31:02 PDT 2017
On 04/23/2017 08:17 AM, Bernd Schubert wrote:
>
> On 04/20/2017 11:14 PM, Prentice Bisbal wrote:
>> On 04/19/2017 05:52 PM, Bernd Schubert wrote:
>>
>>> On 04/19/2017 07:58 PM, Prentice Bisbal wrote:
>>>> Here's the sequence of events:
>>>>
>>>> 1. First job(s) run fine on the node and complete without error.
>>>>
>>>> 2. Eventually a job fails with a 'permission denied' error when it tries
>>>> to access /l/hostname.
>>> So you don't get ESTALE, but you get EACCESS? You *might* be able to fix
>>> this by setting the 'no_subtree_check' in your /etc/exports. I don't
>>> remember the details exactly anymore, but nfsd/exportfs check more
>>> intensively if a dentry is valid if this option is not given.
>> I don't remember seeing either ESTALE or EACCESS, just that there was a
>> message about stale file handles. I didn't save the messages I with
> You said "Eventually a job fails with a 'permission denied'" and that is
> access and not ESTALE?
Okay, this is a confusing point. When I cd to /l/hostname, the error
returned by the OS is "permissions denied". That's it. When I try to
mount the filesystem manually while watching the communications with TCP
dump, I see the server sending a "Stale File Handle" message back to the
client, but I don't see the words ESTALE or EACCESS explicitly in those
messages, or in the rpcdebug messages when turn on rpcdebugging.
>
> [...]
>
>>> Btw, which kernel version and file system is your nfs server running on?
>> Both servers and clients are running the same exact version of
>> everything, since they are using the same NFS root filesystem:
>>
>> $ cat /etc/redhat-release
>> CentOS release 6.8 (Final)
>>
>> $ cat /proc/version
>> Linux version 2.6.32-642.11.1.el6.x86_64
>> (mockbuild at c1bm.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat
>> 4.4.7-17) (GCC) ) #1 SMP Fri Nov 18 19:25:05 UTC 2016
>>
>> $ rpm -qa | grep -i nfs
>> nfs-utils-lib-1.1.5-11.el6.x86_64
>> nfs-utils-1.2.3-70.el6_8.2.x86_64
>> nfs4-acl-tools-0.3.3-8.el6.x86_64
> I mean what is the file system the NFS server is running on?
Oh. Oops. ext4
More information about the Beowulf
mailing list