[Beowulf] Odd NFS write issue for commands issued in a script
David Mathog
mathog at caltech.edu
Wed Dec 9 01:30:14 UTC 2020
Can anybody suggest why a script which causes writes to an NFS mounted
directory like so
ssh remotenode 'command >/usr/common/tmp/outfile.txt'
could somehow fail that write silently, but this variant
ssh remotenode 'command >/tmp/outfile; mv /tmp/outfile
/usr/common/tmp/outfile.txt'
would always succeed?
(Actually it is slightly more complicated than this because
the whole command string shown above is constructed and then run in
another program within a system() call. Initially this turned up inside
a threaded version, but it does it even with a straight system() call.
I cannot reproduce this problem by running the ssh commands from the
command line, it only happens inside the script. The files so far have
been relatively small, less than 50kb. "command" is a run of the NCBI
blastn program, although that is probably irrelevant.)
I have even seen this happen:
ssh remotenode 'command >/usr/common/tmp/outfile.txt; ls -al
/usr/common/tmp/outfile.txt'
ls -al /usr/common/tmp/outfile.txt
where the first ls (running on the remote node) shows the output file
while the second (running on the NFS server) does not.
This is on a CentOS 7 system. The server was last updated 8 days ago
but the compute nodes have not been updated in almost a year.
Server kernel is 3.10.0-1160.6.1.el7.x86_64
Client kernel is 3.10.0-1062.12.1.el7.x86_64
There are no error messages in stderr, /var/log/messages, or dmesg.
The client's fstab has:
server:/usr/common /usr/common nfs bg,hard,intr,rw 1 1
and the server's /etc/exports has:
/usr/common *.cluster(rw,sync,no_root_squash)
Thanks,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the Beowulf
mailing list