Cluster programming...

Karl Bellve Karl.Bellve at
Fri Jan 24 11:54:08 PST 2003

I want to thank everyone for the suggestions.

First,  one problem was that some nodes did not have the fftw libraries 
in their which caused them to fail and not output a result. 
I corrected that.

Now I get two different behaviors when I run my application:

1) Locking activated. Every node except the master node writes properly 
to the file. If I take out the master node out of the possible choices, 
everything works fine. I am not sure what is up with the master node. It 
should be exactly like the others, except it has two ethernet cards. 
Once in a while, I see another node fail to write out.

2) Locking not activated, just seek and write. Master node can now write 
properly, but now I get random drops from other nodes. Not the same 
node. Some runs show no drops.

I might go with your option, about cating the files together at the end.

I decided against using a lock file. Although, I send out jobs 
sequentially, they won't finished squentially, which will delay some 
nodes getting another job if they finish early.

Jakob Oestergaard wrote:

>On Wed, Jan 22, 2003 at 10:52:31AM -0500, Karl Bellve wrote:
>>I am running into a little problem about multiple writes to a single 
>>file via NFS.
>Ok, first of all that sounds like a bad idea to begin with.
>Why not have each node write it's own file, and run a "cat node.* >
>bigfile" afterwards?
>Quadratish, praktisch, gut  ;)


Karl Bellve, Ph.D.                   ICQ # 13956200
Biomedical Imaging Group             TLCA# 7938 		
University of Massachusetts
Email: Karl.Bellve at
Phone: (508) 856-6514
Fax:   (508) 856-1840
PGP Public key: finger kdb at

More information about the Beowulf mailing list