[Beowulf] Troubleshooting NFS stale file handles
pbisbal at pppl.gov
Wed Apr 19 11:34:43 PDT 2017
On 04/19/2017 02:17 PM, Ellis H. Wilson III wrote:
> On 04/19/2017 02:11 PM, Prentice Bisbal wrote:
>> Thanks for the suggestion(s). Just this morning I started considering
>> the network as a possible source of error. My stale file handle errors
>> are easily fixed by just restarting the nfs servers with 'service nfs
>> restart', so they aren't as severe you describe.
> If a restart on solely the /server-side/ gets you back into a good
> state this is an interesting tidbit.
That is correct, restarting NFS on the server-side is all it takes to
fix the problem
> Do you have some form of HA setup for NFS? Automatic failover
> (sometimes setup with IP aliasing) in the face of network hiccups can
> occasionally goof the clients if they aren't setup properly to keep up
> with the change. A restart of the server will likely revert back to
> using the primary, resulting in the clients thinking everything is
> back up and healthy again. This situation varies so much between
> vendors it's hard to say much more without more details on your setup.
My setup isn't nearly that complicated. Every node in this cluster has a
/local directory that is shared out to the other nodes in the cluster.
The other nodes automount this by remote directory as /l/hostname, where
"hostname" is the name of owner of the filesystem. For example, hostB
will mount hostA:/local as /l/lhostA.
No fancy fail-over or anything like that.
> P.S., apologies for the top-post last time around.
NO worries. I'm so used to people doing that, in mailing lists that I've
become numb to it.
More information about the Beowulf