[Beowulf] quick note on Redhat NFS issues with NAS units
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comSun Dec 26 12:59:18 PST 2004
- Previous message: [Beowulf] Beowulf test Software
- Next message: [Beowulf] quick note on Redhat NFS issues with NAS units
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Folks: Been looking into why a Redhat EL3 WS x86_64 client hangs when accessing a NAS based upon SuSE 9x. Turns out there are two problems. I can reliably cause the problem to appear/dissappear on my test hardware, and I thought others on this group would like to see what I did to make the problems dissappear. Problem manifests itself with RedHat EL3 WS x86_64 clients. I have not been able to replicate it with non-RHEL3 based clients, on the same hardware, including FC2/FC3/Ubuntu/SuSE9x/... Problem does not show up in 32 bit mode from what I can tell (need more testing but preliminary data seems to support this). Motherboard are Tyan s288x units. All have the Broadcom chipset ethernets. By default RedHat installs tg3 kernel modules to drive these chips. I have not been able to make the problem go away using the tg3 driver. So I replaced the tg3 driver with the bcm570x driver from Broadcom's download site. This did not make the problem go away, though NFS mount now respected the intr option (did not with the tg3). Next, I changed from udp to tcp. The original fstab line was 192.168.2.17:/big /big nfs udp,intr,bg 0 0 and the new one is 192.168.2.17:/big /big nfs tcp,intr,bg,wsize=32768,rsize=32768 0 0 MTU changes did not affect the results (though they did improve on some of the test timing). Without both of these changes, the RedHat client hangs with a simple ls (btw: strace is your friend) [root at hammer root]# strace ls /big execve("/bin/ls", ["ls", "/big"], [/* 28 vars */]) = 0 uname({sys="Linux", node="hammer.scalableinformatics.com", ...}) = 0 brk(0) = 0x513000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a9566c000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 ... stat("/big", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 open("/big", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3 fstat(3, {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 getdents64(3, <unfinished ...> With the changes, it ls'es quite nicely with no hangs. I can reliably and repeatably get the hang condition by switching back to udp (and the other mount line). I can reliably and repeatedly get the hang condition by switching back to the tg3 driver. This occurred with the Rocks toolkit (based upon RHEL3 WS). The workaround involves using our finishing scripts. I figured I would share the solution, as I spent a bit of time tracking it down and trying to reproduce it and solve it. Joe ps: if there are some Redhat people reading the list, you know, we would like some modern kernels, and not lots of backported stuff, not to mention xfs, and other goodies ... (yeah, I know, wait till EL4, ...) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 612 4615
- Previous message: [Beowulf] Beowulf test Software
- Next message: [Beowulf] quick note on Redhat NFS issues with NAS units
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
