Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] after update sgeexecd not starting correctly on reboot

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

David Mathog mathog at caltech.edu
Tue Nov 25 14:40:38 PST 2008


This is an odd one, and I hope one of you has seen it and fixed it,
because the only way I have been able to trigger the bug is through a
reboot.  

I updated one node from Mandriva 2007.1 to 2008.1.  Those are both 2.6.x
kernels, and are as you might guess about a year apart.  Both use
the exact same SGE distribution, which is NFS mounted on /usr/SGE6.
On a reboot of the newer system, /etc/rc.d/init.d/sgeexecd, which is the
last thing to start in runlevel 3 (except for S99local, which doesn't do
anything except "touch  /var/lock/subsys/local") fails.  First it
spews a bunch of lines which look like a script did "set", and as a side
effect, this pushes all the other text lines off the console, and then
it emits

  can't determine path to Grid Engine binaries

without starting sge_execd.  On the older system the exact same scipt
starts up with none of this drama, leaving sge_execd running.

However, once I logon as root at the console on the newer system, it
happily starts up with:

/etc/rc.d/init.d/sgeexecd start

There are no SGE variables defined in .bashrc etc. The init script
has these prerequisites, as on the older system:

# Provides:       sgeexecd 
# Required-Start: $network $remote_fs

Ring any bells?  

I think  maybe the NFS mounting is different, so that the remote_fs
prerequisite isn't really satisfied, even though the associated script
has run.  The sgeexecd script does include a test:

while [ ! -d "$SGE_ROOT" -a $count -le 120 ]; do
   count=`expr $count + 1`
   sleep 1
done

but since SGE_ROOT is the mount point, the test will be true whether or
not the NFS mount has completed.  Maybe I'll change that to
$SGE_ROOT/bin and see if it helps.


Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list