[Beowulf] Troubleshooting NFS stale file handles

Bernd Schubert bernd.schubert at fastmail.fm
Sun Apr 23 05:17:46 PDT 2017



On 04/20/2017 11:14 PM, Prentice Bisbal wrote:
> On 04/19/2017 05:52 PM, Bernd Schubert wrote:
> 
>>
>> On 04/19/2017 07:58 PM, Prentice Bisbal wrote:
>>> Here's the sequence of events:
>>>
>>> 1. First job(s) run fine on the node and complete without error.
>>>
>>> 2. Eventually a job fails with a 'permission denied' error when it tries
>>> to access /l/hostname.
>> So you don't get ESTALE, but you get EACCESS? You *might* be able to fix
>> this by setting the 'no_subtree_check' in your /etc/exports. I don't
>> remember the details exactly anymore, but nfsd/exportfs check more
>> intensively if a dentry is valid if this option is not given.
> 
> I don't remember seeing either ESTALE or EACCESS, just that there was a
> message about stale file handles. I didn't save the messages I with

You said "Eventually a job fails with a 'permission denied'" and that is
access and not ESTALE?

[...]

>> Btw, which kernel version and file system is your nfs server running on?
> Both servers and clients are running the same exact version of
> everything, since they are using the same NFS root filesystem:
> 
> $ cat /etc/redhat-release
> CentOS release 6.8 (Final)
> 
> $ cat /proc/version
> Linux version 2.6.32-642.11.1.el6.x86_64
> (mockbuild at c1bm.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat
> 4.4.7-17) (GCC) ) #1 SMP Fri Nov 18 19:25:05 UTC 2016
> 
> $ rpm -qa | grep -i nfs
> nfs-utils-lib-1.1.5-11.el6.x86_64
> nfs-utils-1.2.3-70.el6_8.2.x86_64
> nfs4-acl-tools-0.3.3-8.el6.x86_64

I mean what is the file system the NFS server is running on?


More information about the Beowulf mailing list