[Beowulf] Odd NFS write issue for commands issued in a script
David Mathog
mathog at caltech.edu
Fri Dec 11 18:30:51 UTC 2020
On 8 Dec 2020 17:30:14 -0800 David Mathog wrote
> Can anybody suggest why a script which causes writes to an NFS mounted
> directory like so
>
> ssh remotenode 'command >/usr/common/tmp/outfile.txt'
>
> could somehow fail that write silently, but this variant
>
> ssh remotenode 'command >/tmp/outfile; mv /tmp/outfile
> /usr/common/tmp/outfile.txt'
>
> would always succeed?
Posted my work so far on this problem here:
https://forums.centos.org/viewtopic.php?f=47&t=76621&p=322143#p322143
(I need to find a test program which will do the same thing, other than
blastn,
so that a smaller simpler test case can be written.)
It seems to be an odd corner case revolving around the NFS server's and
client's views of the target directory getting out of sync:
1. ssh command to NFS client causes file with a constant name to be
written to a target NFS shared directory.
2. on return from ssh the script on the NFS server creates a
subdirectory in the target directory and moves the file into it. On the
server the time stamps on the target directory are updated.
3. ssh command as in (1), with the same file name. The NFS client sees
a cached version of the target directory which is unchanged since the
first cycle wrote its file. So it still sees the "existing" file in the
target directory and uses that inode for its next write to that file.
On the Server that file is no longer in that directory. That causes the
first copy to be overwritten in its subdirectory. I think the NFS server
sees an operation "write to inode" from the client, and since that inode
is in a different directory, the target directory is not updated on the
client, instead the subdirectory holding the that inode is.
(repeat steps 2,3 and all subsequent remote commands will cause the
first file to be overwritten with new data.
This is not the known NFS ext3 issue concerning 1 second date stamps,
the served directory is ext4. It does this for both NFS3 and NFS4.
Two workarounds which will cause the file to be written correctly into
the target directory each time.
1. #on client, force it to update its target directory information
before
#running the program which creates the file.
#"somefile" is a filename different than the one used above, for
instance
#a random string.
touch $TARGET_DIR/somefile;
/bin/rm $TARGET_DIR/somefile;
#run program to create output file in $TARGET_DIR
2. Direct blastn output to a local file (like "/tmp/output"), then copy
that to the final destination. I don't know why this one works, and it
suggests that blastn's odd "create an empty file and the wait several
seconds before writing anything" behavior may also play a role.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the Beowulf
mailing list