Writing/Reading Files
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Josip Loncaric josip at icase.eduWed May 8 09:14:36 PDT 2002
- Previous message: Writing/Reading Files
- Next message: Does my channel bond work
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Robert G. Brown" wrote: > > On Tue, 7 May 2002, Timothy W. Moore wrote: > > > This is getting frustrating. I have an application where each node > > creates its own data file. When I go to process these files with a > > serial application on the host, it can only read the first timestep > > contained within the file. Could this have something to with NFS... > > I am still having a bit of trouble visualizing your difficulty. Perhaps this is similar to a problem one of our users had. His parallel optimization code works like this: (1) process 0 writes many input files, each of which defines a test case (2) MPI_Barrier(MPI_COMM_WORLD); (3) many processes compute their own test cases, write own output files (4) MPI_Barrier(MPI_COMM_WORLD); (5) process 0 reads output files then selects new test cases (6) repeat from (1) until some criterion is satisfied The problem was this: MPI_Barrier takes only a few hundred microseconds after the last file is written, while NFS may take longer to complete writing the last file to the NFS server. This coding style is particularly hard on NFS since the server gets bombarded with 20-30 simultaneous writes just before step (4), which means that the server can (and sometimes does) run out of nfsd threads, necessitating many retries etc. Meanwhile, since the process 0 got going again within microseconds, it can find that the file it wants to read is incomplete or missing. The solution was to insert a 3 second "sleep" after the MPI_Barrier() in step (4). Sincerely, Josip P.S. This NFS behavior is by design, and "noac" does not help much. In general, NFS is a slow and unreliable method of passing data between processes. MPI is much better -- IF you have source code access. In this case, only binary executables using input/output files were available... P.P.S. As someone has pointed out here, one should check /proc/net/rpc/nfsd on the NFS server where the last number on the "th" line indicates the nnumber of times all nfsd threads were in use. If this seems high, increase the number of nfsd threads by editing the RPCNFSDCOUNT=8 line in /etc/rc.d/init.d/nfs script. We use RPCNFSDCOUNT=32 on our main server now. Moreover, if you use "soft" NFS mounts on the NFS clients, increase the number of retransmissions (default is retrans=3) to something like retrans=10. Finally, use "noac" attribute when mounting NFS filesystems (required by MPI-IO). -- Dr. Josip Loncaric, Research Fellow mailto:josip at icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric at larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134
- Previous message: Writing/Reading Files
- Next message: Does my channel bond work
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
