node status
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Oct 8 08:24:51 PDT 2001
- Previous message: node status
- Next message: node status
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 8 Oct 2001, Bob Campbell wrote: > Okay, thanks for all the viewpoints on NIS, looks like rsync > is the best way to go. > > There was mention of ways to help recognise and sync downed > nodes. What I would like is a tool of some sort that would > keep a live status of all the nodes, and what state they are in. > These 'states' could be as simple as UP, DOWN, BOOTING, FAILED, etc. > I would also like this to insert and remove the hosts from the list > of available hosts. > > > ..... > > Ok, I know Beowulf2 already has this. Beowulf2 looks to do all this > with bproc though, and I need to stay in userspace. For various > reasons I need to stay with vendor supplied kernels so I cant > compile bproc in. > > > any thoughts on this, or anyone know of any software with similar > features? procstatd (available on http://www.phy.duke.edu/brahma) is at least one way of doing it that is fairly low overhead. I'm not sure about BOOTING and FAILED -- those are distinguishable from DOWN only by inference -- but you can at retrieve a wealth of proc-based information in a simple ascii-packed packet including uptime, load averages, network traffic averages, and the like. You can easily use the daemon to feed a perl script or website based on a central host. I should note that when a host goes down it is not at all easy to tell from the outside. Did it go down or is it busy? Did it go down or did the network connection to the host go away for some reason? Even querying the host daemon over the network requires either a TCP timeout or consistently failing UDP connections to be able to "guess" that the host is down (presuming that you otherwise trust your network's stability). Alternatively you can use an NFS mount. Each host writes its state information into e.g. /usr/share/beowulf/bX where X is the node id, and an application that shares the same mount can open all the bX's and compile a table and display it or take action however you like. Just pinging a host gives you approximate up/down information -- at the very least if it pings it MIGHT be up and running normally, while if it doesn't ping there are likely problems with the host or the network. Finally, there are (remote) shell-based methods, but they are all going to be moderately expensive in systems resources as shells are moderately expensive in systems resources and remote shells more so. Hope some of this helps. rgb > > > __________________________________________________ > Do You Yahoo!? > NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. > http://geocities.yahoo.com/ps/info1 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: node status
- Next message: node status
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
