[Beowulf] quick note on Redhat NFS issues with NAS units

Joe Landman landman at scalableinformatics.com
Wed Dec 29 05:29:16 PST 2004



Jan-Frode Myklebust wrote:

>On Sun, Dec 26, 2004 at 03:59:18PM -0500, Joe Landman wrote:
>  
>
>>Folks:
>>
>> Been looking into why a Redhat EL3 WS x86_64 client hangs when 
>>accessing a NAS based upon SuSE 9x.  
>>    
>>
>
>Great, thanks for this note!
>
>I've been struggeling quite a bit myself with Rocks-3.3 on opteron
>(IBM e326), with AIX as file-server. I still don't quite understand
>exactly what caused my hangs, but after reverting back to udp, and
>default mount options plus increasing the number of lock-daemons on
>the AIX-server, I now have a stable NFS. Still struggeling a bit with
>the NFS performance.. 
>
>Should maybe test if bcm lets me go back to nfs over tcp.
>  
>

I may have spoken a bit early ...  It works in my test enviroment, works 
on the compute nodes, fails on the head node.  I can mount and unmount, 
and intr now works.  I can see the top-most directory of the mount.  
Traverse the mount point by one level (say to any subdirectory) and do 
an ls, or something that does a stat, and it hangs.  Only on the head 
node.  Compute nodes work perfectly now.  No hangs.  None of the above 
mentioned behavior.

I may reload the head node.  I will be trying to force replication of 
this in my lab, but if I cannot, I will do the head node reload.  I am 
starting to suspect some sort of cached state (which is incorrect) on 
the head node.

>  
>
>>ps: if there are some Redhat people reading the list, you know, we would 
>>like some modern kernels, and not lots of backported stuff, not to 
>>mention xfs, and other goodies ... (yeah, I know, wait till EL4, ...)
>>
>>    
>>
>
>Maybe someone should do a kernel-2.6 roll for Rocks...
>  
>

I just pulled down the ROCKS source trees with the intention of rolling 
a 2.6 (with XFS, Trond and others NFS patches, and Andi Kleen's x86_64 
bits).  If I get this done soon I'll post a note looking for crash 
dummies^H^H^H^H^H^H^H^H^H^H^H^H^H volunteers to help me test.

Joe

>
>  -jf
>  
>

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 612 4615




More information about the Beowulf mailing list