[Beowulf] Transient NFS Problems in New Cluster
henning.fehrmann at aei.mpg.de
Tue Feb 2 23:28:45 PST 2010
On Tue, Feb 02, 2010 at 02:00:37PM -0800, Jon Forrest wrote:
> I have a new cluster running CentOS 5.3.
> The cluster uses a Sun 7310 storage server
> that provides NFS service over a private
> 1Gb/s ethernet with 9K jumbo frames to the
> We've noticed that a number of the compute
> nodes sometimes generate the
> automount: umount_autofs_indirect: ask umount returned busy /home
> message. When this happens the program running on the
> node dies. This has happened between 10 and 20 times.
> We're not sure what's going on on a node when this
> happens. Most of the time everything is fine and
> the home directories are automounted without problem.
> I've googled for this problem and I see that other people
> have seen it too, but I've never seen a resolution,
> especially not for RHEL5.
I guess the problem has not directly something to do with RHEL5.
You might want to post this question to
autofs at linux.kernel.org
They need to know the version of autofs and the kernel.
> The auto.master line for this mount is
> /home /etc/auto.home --timeout=1200
You could try to reduce the timeout. Nothing speaks against a timeout
of 60s. Many things can happen in 1200s - especially on the server side.
You could try nolock on the client side and async on the
server side. The user should take care that not two processes are
writing into the same files to avoid race conditions.
More information about the Beowulf