[Beowulf] Odd NFS write issue for commands issued in a script

David Mathog mathog at caltech.edu
Fri Dec 11 18:30:51 UTC 2020


On 8 Dec 2020 17:30:14 -0800 David Mathog wrote
> Can anybody suggest why a script which causes writes to an NFS mounted
> directory like so
> 
>     ssh remotenode 'command >/usr/common/tmp/outfile.txt'
> 
> could somehow fail that write silently, but this variant
> 
>     ssh remotenode 'command >/tmp/outfile; mv /tmp/outfile
> /usr/common/tmp/outfile.txt'
> 
> would always succeed?

Posted my work so far on this problem here:

    https://forums.centos.org/viewtopic.php?f=47&t=76621&p=322143#p322143

(I need to find a test program which will do the same thing, other than 
blastn,
so that a smaller simpler test case can be written.)

It seems to be an odd corner case revolving around the NFS server's and 
client's views of the target directory getting out of sync:

1.  ssh command to NFS client causes file with a constant name to be 
written to a target NFS shared directory.
2.  on return from ssh the script on the NFS server creates a 
subdirectory in the target directory and moves the file into it.  On the 
server the time stamps on the target directory are updated.
3.  ssh command as in (1), with the same file name.  The NFS client sees 
a cached version of the target directory which is unchanged since the 
first cycle wrote its file.  So it still sees the "existing" file in the 
target directory and uses that inode for its next write to that file.  
On the Server that file is no longer in that directory.  That causes the 
first copy to be overwritten in its subdirectory. I think the NFS server 
sees an operation "write to inode" from the client, and since that inode 
is in a different directory, the target directory is not updated on the 
client, instead the subdirectory holding the that inode is.

(repeat steps 2,3 and all subsequent remote commands will cause the 
first file to be overwritten with new data.

This is not the known NFS ext3 issue concerning 1 second date stamps, 
the served directory is ext4.  It does this for both NFS3 and NFS4.

Two workarounds which will cause the file to be written correctly into 
the target directory each time.

1. #on client, force it to update its target directory information 
before
    #running the program which creates the file.
    #"somefile" is a filename different than the one used above, for 
instance
    #a random string.
    touch $TARGET_DIR/somefile;
    /bin/rm $TARGET_DIR/somefile;
    #run program to create output file in $TARGET_DIR

2. Direct blastn output to a local file (like "/tmp/output"), then copy 
that to the final destination.  I don't know why this one works, and it 
suggests that blastn's odd "create an empty file and the wait several 
seconds before writing anything" behavior may also play a role.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


More information about the Beowulf mailing list