[Beowulf] Transient NFS Problems in New Cluster
Prentice Bisbal
prentice at ias.edu
Wed Feb 3 07:22:17 PST 2010
Jon Forrest wrote:
> I have a new cluster running CentOS 5.3.
> The cluster uses a Sun 7310 storage server
> that provides NFS service over a private
> 1Gb/s ethernet with 9K jumbo frames to the
> cluster.
>
> We've noticed that a number of the compute
> nodes sometimes generate the
>
> automount[15023]: umount_autofs_indirect: ask umount returned busy /home
>
> message. When this happens the program running on the
> node dies. This has happened between 10 and 20 times.
> We're not sure what's going on on a node when this
> happens. Most of the time everything is fine and
> the home directories are automounted without problem.
>
> I've googled for this problem and I see that other people
> have seen it too, but I've never seen a resolution,
> especially not for RHEL5.
>
> The auto.master line for this mount is
>
> /home /etc/auto.home --timeout=1200
> noatime,nodiratime,rw,noacl,rsize=32768,wsize=32768
>
> The network interface configuration is
>
Jon,
I had this same exact problem a couple of weeks ago after changing the
autmounting scheme on our network, requiring all nodes to reread the
automounter configuration. It only happened on a few nodes.
My only solution was reboot the nodes with the problem. After rebooting,
'service autofs reload' or 'service autofs restart' worked without a
problem.
I'm sure that's not the answer you were looking for, but that's all I
got. Sorry. I suspect its a bug in the automount daemon, but I can't
prove it.
--
Prentice
More information about the Beowulf
mailing list