[Beowulf] NFS Read Errors

Wed Dec 5 08:55:50 PST 2007

This tale is at an end, I think, because I can't bear to tell it much 
longer.  As many have suggested, there is probably a hardware 
problem, and since the hardware is old, I will do without the 
services of the troublesome machines -- It turns out that there is 
another acting up as well -- till they are replaced in a couple of weeks.

Many thanks to all who racked their brains for helpful suggestions.

I want to tell a little more of what I have learned, before I drop 
the subject altogether.

First, I did swap the cable of the bad machine with that of a good 
one with no effect on either machine.  This eliminates the 
possibility of the cable or the switch port being bad.  Since I had 
previously changed out the NIC and the switch, the only possibilty is 
something inside the machine itself, probably the motherboard, but 
possibly a corrupted kernel module for handling udp -- more on that below.

Second, we could find no sign of this failure in any log.  Nor did 
/proc/net/dev show any errors.  The suggestion is that older kernels 
aren't going to detect and report such errors.  I think that's 
because they do nfs over udp.  More about that in a moment.

Third, though netcat isn't on these systems, nc is.  We didn't get 
around to trying it, because we found ttcp.

Fourth, with ttcp over tcp, I found that the troubled machine could 
send 800 MB in about 20 seconds -- the wire speed for those 32-bit 
PCI slots as tested by netpipe.  However, if I used ttcp over udp, I 
couldn't reliably send even ten 8192-byte blocks!  Successive sends 
and receives would receive 3, or 1, or 5 blocks.  Don't ask me how 
these two facts are compatible.  I don't know.

Clearly, this puts a premium on using tcp for nfs.  All our attempts 
to do that failed.  Well, both of them, anyway.  In the first one, we 
unmounted the offending disk, modified its fstab entry, and remounted 
it.  We were pretty careful in the second one, where we added tcp to 
the fstab argument, unmounted all the remote disks, restarted all the 
nfsd's, and did 'mount -a'.  We got an error message in both cases 
that didn't obviously refer to the tcp argument, but the mount didn't 
happen.  As I write this, I see references to tcp mount requests in 
the mountd man page, so maybe we need to do a bit more here.

The Wikipedia article on nfs says this:  "At the time of introduction 
of Version 3, vendor support for TCP as a transport-layer protocol 
began increasing. While several vendors had already added support for 
NFS Version 2 with TCP as a transport, Sun Microsystems added support 
for TCP as a transport for NFS at the same time it added support for 
Version 3."

I'd like to know what version of nfs this server supports, but the 
man page on nfsd doesn't say.  The man page on rpc.mountd says that 
it supports nfs version 2 and version 3, but that "If the NFS kernel 
module was compiled without support for NFSv3, rpc.mountd must be 
invoked with the option --no-nfs-version 3."  Yet the 
/proc/procnum/cmdline for the running rpc.mountd doesn't show a 
--no-nfs-version argument.  Clearly, both the kernel and the server 
need to support the use of tcp.

I'd like to get any of our other machines with these older kernels at 
other sites to using tcp for nfs where possible, in order to avoid 
this in the future.  We are already seeing signs of network problems 
on them.  If that's not possible, then in order to avoid a complete 
rebuild of those systems -- there are 12 of them -- we are going to 
put a testing script together using remote invocations of md5sum and 
comparison of results to recorded local results.

Thanks again!

Mike

At 08:54 AM 12/4/2007, you wrote:
>Mark,
>
>Thanks for your helpful comments.
>
>At 11:31 PM 12/3/2007, you wrote:
>>>I am guessing you are using TCP NFS mounts as well?  TCP forces 
>>>retries in the event of bad packets.  UDP doesn't force this, but 
>>>the NFS protocol will
>>
>>UDP has a checksum as well, though it's only 16b.  then again, the TCP
>>checksum isn't all that strong for today's data rates either.
>
> From reading the man page on nfs on the systems with the 2.4 
> kernels, it looks like the default for an nfs mount is udp.  It 
> also looks like tcp is not really an option until nfs v4, so it may 
> be something to try on the 2.6 kernels that I have on some of my 
> newer machines at another site.
>
>>you should definitely examine /proc/net/dev on involved machines.
>
>I hadn't known about /proc/net/dev.  When I check there, I see no 
>transmit errors on the server side and no receive errors on the 
>client side.  That's odd, because the other thing I see is that the 
>average packet size received (bytes received divided by packets 
>received) on the client side is 3.9, while on the server side, the 
>average packet size sent is 1430.  In other words, there are a many 
>more packets received than there ought to be.  That's very 
>fishy.  It's probably the result of the way the packet count is done 
>and reported.  I.e., it may be that all the received packets -- good 
>and bad -- are counted, but only the bytes in the good ones are 
>counted, with some similar problem on the server side.  I think the 
>statistics are aggregate since the last boot, so they may not be 
>just from the troublesome tests I was performing, either.
>
>>I would attempt to reduce the complexity of your testing.
>>for instance, can a node write and verify to its local disk
>>without problem?
>
>The local disk read seems rock solid in comparison to the NFS 
>one.  The local md5sum produces the same result time after time, 
>which is just not the case for the remote.
>
>>can it stream data over tcp sockets (netcat or the like) without 
>>corruption or obvious problems reflected
>>in /proc/net/dev?
>
>netcat is not on my systems.  Looks like I have to get someone to 
>download and build it for me, and try the streaming tests you recommend.
>
>>does ethtool tell you anything about the config of the nic?
>
>Not on the 2.4 systems, though it seems to tell me a little on the 2.6's.
>
>>comparing tcp vs udp NFS would be sensible
>>as well - varying the packet size, too.  switching client and/or 
>>server to a modern 2.6 kernel may be instructive.
>
>Upgrading the kernel is probably the only way I'll get nfs over 
>tcp.  Given that these systems are headed out the door, I'm not sure 
>that's a good use of our time.  But it may be worth doing an our new 
>and newer systems.
>
>Thanks again!
>
>
>Mike
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf