[Beowulf] quick note on Redhat NFS issues with NAS units

Jan-Frode Myklebust Jan-Frode.Myklebust at bccs.uib.no
Thu Dec 30 05:53:36 PST 2004


On Wed, Dec 29, 2004 at 08:29:16AM -0500, Joe Landman wrote:
> 
> I may have spoken a bit early ...  

Sorry to hear that..

> It works in my test enviroment, works 
> on the compute nodes, fails on the head node.  I can mount and unmount, 
> and intr now works.  I can see the top-most directory of the mount.  
> Traverse the mount point by one level (say to any subdirectory) and do 
> an ls, or something that does a stat, and it hangs.  Only on the head 
> node.  Compute nodes work perfectly now.  No hangs.  None of the above 
> mentioned behavior.

My hangs were also only (?) on the head node, but I couldn't reliably
trigger it. After a while (10's of minutes) the hang would start, I
could still list directories, but anything touching the files would
hang, and only way I found to recover was to reboot. 

The difference between the head and the compute nodes is mainly that
the head node will typically have a lot more users and processes
active on the filesystems, while the compute nodes will work more
sequential (open one file, read it, close it, open next file, etc..),
so maybe my increase of lock-daemons on the server was the cure for
me.

Do you have any parameters to tune on your NAS-box? Supporting a full
cluster might put a different load on it than it was originally aimed
at.


  -jf



More information about the Beowulf mailing list