Cluster programming...

Jakob Oestergaard jakob at unthought.net
Thu Jan 23 23:57:39 PST 2003


On Wed, Jan 22, 2003 at 10:52:31AM -0500, Karl Bellve wrote:
> 
> I am running into a little problem about multiple writes to a single 
> file via NFS.

Ok, first of all that sounds like a bad idea to begin with.

Why not have each node write it's own file, and run a "cat node.* >
bigfile" afterwards?

Quadratish, praktisch, gut  ;)

> An application is spawned on a number of nodes. When they are done, they 
> all write to a specific, but non-overlapping area of the NFS mounted 
> file.

If the parts are non-overlapping, I assume that the offset and data
length of each node's write is fixed - correct ?

> I use fcntl (fd, F_SETLKW, &lck) to lock to file, or wait until it 
> can lock the file for writing. Fcntl() is capable to lock across NFS. 

If I was correct above - why do you need to lock the file?

A seek() + write() should do the trick as I see it - but maybe there's
something I don't see  :)

> However, some nodes fail to write their result to the file. It isn't the 
> same nodes every time. I am not seeing any write errors. I tend to think 
> it is a NFS caching issue. All writes get flushed before releasing the 
> lock via fsync() and close().
> 
> The fileserver is a Redhat 8.0 system. I uprgaded to the latest Kernel 
> offered to RH8.0. That didn't fix the problem. I compiled a new kernel 
> (2.4.20) and that didn't fix the problem. The nodes are Alpha's running 
> RH6.2.
> 
> I am thinking about alternate means of locking but fnctl() should be the 
> trick.

I completely agree with you that locking should work - and you have
already received many good suggestions from fellow 'wolfers on how to
test/check/improve the locking on your systems.

What I'm curious about is, if you need locking at all.  While it should
of course work, avoiding it would solve the problem completely.

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:



More information about the Beowulf mailing list