[Beowulf] Odd NFS write issue for commands issued in a script

David Mathog mathog at caltech.edu
Wed Dec 9 01:30:14 UTC 2020


Can anybody suggest why a script which causes writes to an NFS mounted 
directory like so

    ssh remotenode 'command >/usr/common/tmp/outfile.txt'

could somehow fail that write silently, but this variant

    ssh remotenode 'command >/tmp/outfile; mv /tmp/outfile 
/usr/common/tmp/outfile.txt'

would always succeed?

(Actually it is slightly more complicated than this because
the whole command string shown above is constructed and then run in 
another program within a system() call.  Initially this turned up inside 
a threaded version, but it does it even with a straight system() call.  
I cannot reproduce this problem by running the ssh commands from the 
command line, it only happens inside the script.  The files so far have 
been relatively small, less than 50kb.  "command" is a run of the NCBI 
blastn program, although that is probably irrelevant.)

I have even seen this happen:

    ssh remotenode 'command >/usr/common/tmp/outfile.txt; ls -al 
/usr/common/tmp/outfile.txt'
    ls -al /usr/common/tmp/outfile.txt

where the first ls (running on the remote node) shows the output file 
while the second (running on the NFS server) does not.

This is on a CentOS 7 system.  The server was last updated 8 days ago 
but the compute nodes have not been updated in almost a year.

Server kernel is  3.10.0-1160.6.1.el7.x86_64
Client kernel is  3.10.0-1062.12.1.el7.x86_64

There are no error messages in stderr, /var/log/messages, or dmesg.

The client's fstab has:

   server:/usr/common   /usr/common     nfs     bg,hard,intr,rw 1       1

and the server's /etc/exports has:

   /usr/common      *.cluster(rw,sync,no_root_squash)


Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list