[Beowulf] Transient NFS Problems in New Cluster

Prentice Bisbal prentice at ias.edu
Wed Feb 3 07:22:17 PST 2010



Jon Forrest wrote:
> I have a new cluster running CentOS 5.3.
> The cluster uses a Sun 7310 storage server
> that provides NFS service over a private
> 1Gb/s ethernet with 9K jumbo frames to the
> cluster.
> 
> We've noticed that a number of the compute
> nodes sometimes generate the
> 
> automount[15023]: umount_autofs_indirect: ask umount returned busy /home
> 
> message. When this happens the program running on the
> node dies. This has happened between 10 and 20 times.
> We're not sure what's going on on a node when this
> happens. Most of the time everything is fine and
> the home directories are automounted without problem.
> 
> I've googled for this problem and I see that other people
> have seen it too, but I've never seen a resolution,
> especially not for RHEL5.
> 
> The auto.master line for this mount is
> 
> /home  /etc/auto.home  --timeout=1200
> noatime,nodiratime,rw,noacl,rsize=32768,wsize=32768
> 
> The network interface configuration is
> 

Jon,

I had this same exact problem a couple of weeks ago after changing the
autmounting scheme on our network, requiring all nodes to reread the
automounter configuration. It only happened on a few nodes.

My only solution was reboot the nodes with the problem. After rebooting,
 'service autofs reload' or 'service autofs restart' worked without a
problem.

I'm sure that's not the answer you were looking for, but that's all I
got. Sorry. I suspect its a bug in the automount daemon, but I can't
prove it.


-- 
Prentice



More information about the Beowulf mailing list