beostatus dies

Dave Johnson ddj at cascv.brown.edu
Tue Oct 2 08:26:35 PDT 2001


On Tue, Oct 02, 2001 at 09:29:06AM -0500, Peter Lindgren wrote:
> I've read the two posts in July from Niall Moran and Roger Williams related to this problem, but haven't seen a reply since then.
> (BTW, I'm running the bz7 version from LinuxCentral.)
> 
> I get the following messages:
> 
> beostat_req: beostat_lib.c: 100 Connection failed: Connection refused< lots of this one>
> shmblk_open: Couldn't open shared memory file: /shm_beostat
> shmblk_open failed
> <beostatus dies>
> 
> Does anyone have a suggestion?
> 
> Thanks...
> 
> Peter Lindgren
> Phone: 847 944 4515
> Fax: 847 517 5889
> E-mail: peter.lindgren at experian.com
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Move the lines of libbeostat-0.1.10/recvstats.c between 565 and 568 inside
the test for failure, starting at what was line 570.  The diff line numbers
are off by one as I added support for a second network interface:

----------------
@@ -562,12 +563,12 @@
       /* Accept the connection to get the request code. */
       lib_sock = accept (request_sock, &addr, &addrlen);
 
-      /* Close the old request socket and open a new one. */
-      close (request_sock);
-      request_sock = domain_listen ();
-
       if (lib_sock == -1) {
        perror ("accept");
+
+       /* Close the old request socket and open a new one. */
+       close (request_sock);
+       request_sock = domain_listen ();
        continue;
       } 
 
----------------

The original version leaves a gaping hole where the server is not listening,
every time through the loop.  The original request_sock should be valid
indefinitely, as accept always returns a new socket connected to the client.
This version could be beefed up to look at the reason that accept failed,
and act accordingly, but the previous version clobbered errno by doing the
close and domain_listen before the perror....

Perhaps this has already been fixed in the latest release, but I haven't
looked at the SRPMS yet.

	-- ddj




More information about the Beowulf mailing list