[Beowulf] NFS Read Errors

Mark Hahn hahn at mcmaster.ca
Mon Dec 3 22:31:05 PST 2007

> does the same thing with a slightly simpler syntax.  There is mounting 
> evidence that you should use sha1sum rather than md5sum.

for general checking, md5 is still fine (ie not security-related stuff).

> I am guessing you are using TCP NFS mounts as well?  TCP forces retries in 
> the event of bad packets.  UDP doesn't force this, but the NFS protocol will

UDP has a checksum as well, though it's only 16b.  then again, the TCP
checksum isn't all that strong for today's data rates either.

you should definitely examine /proc/net/dev on involved machines.

>> We are in the process of upgrading and thus replacing all the machines we 
>> have of that configuration due to space limitations and their age, but I'm 
>> still curious what the problem could be.

I would attempt to reduce the complexity of your testing.
for instance, can a node write and verify to its local disk
without problem?  can it stream data over tcp sockets (netcat 
or the like) without corruption or obvious problems reflected
in /proc/net/dev?  does ethtool tell you anything about the 
config of the nic?  comparing tcp vs udp NFS would be sensible
as well - varying the packet size, too.  switching client and/or 
server to a modern 2.6 kernel may be instructive.

