[scyld-users] bpsh in background: defunct processes

Anand Bedekar anandbedekar at yahoo.com
Tue Sep 30 16:47:02 PDT 2003


Hello, 
Thanks for the reply. Some more questions: 

-- The cluster is running "Scyld Beowulf Basic Edition
27bz-8" (not RedHat 7: my mistake). Could you tell me
if the bug with processing status and termination
messages was fixed prior to or after this release?

-- Does this release have the beomap and beorun
dynamic scheduling functionality you described? 

-- If the answer to either of the above is "no", is
there any alternative way (without upgrading) to make
sure that the processes don't go defunct, e.g. by
somehow sending a signal to the run.sh script called
by bpsh or something? Our sysadmin appears reluctant
to upgrade.


Thanks,
Anand
--- Donald Becker <becker at scyld.com> wrote:
> On Mon, 29 Sep 2003, Anand Bedekar wrote:
> 
> > I'm trying to run bpsh in a script that calls bpsh
> in
> > a loop, like this:
> > 
> > for i in 1 2 3
> > do
> >     bpsh -n $i run.sh &
> > done
> 
> Suggestion: you should be using 'beomap' to get a
> dynamic schedule:
> 
> for i in `beomap --np 3`; do ...
> 
> > What happens is that all the processes called
> within
> > run.sh seem to go into a "defunct" state without
> > finishing cleanly. This is making the process
> table
> > fill up, so that no more processes can be run. 
> 
> This sounds like a long-fixed bug in the BProc.  The
> status and
> termination messages were being processed in reverse
> order.
> 
> > Is this usual behaviour when calling bpsh to run a
> > shell script, given the way I am calling 
> > 'bpsh -n $i run.sh &' ? Is there some other way to
> run
> > it? 
> 
> With our new release there is a command named
> 'beorun' that
> automatically combines a scheduler mapping with
> efficiently controlling
> the resulting processes:
>    beorun --np 3  command;
> 
> > Unfortunately all the nodes in the cluster are
> > currently out of action because the process table
> is
> > full on all of them, due to the above.
> 
> You should be able to restart the cluster nodes in
> about a second...
> 
> > So I can't report on which version of scyld has
> been installed,
> > until the sysadmin reboots the whole thing. I do
> know
> > the machines are P3 running RedHat 7.0, kernel
> version
> > 2.2.19.
> 
> That doesn't sound like a Scyld release.
> 
> -- 
> Donald Becker				becker at scyld.com
> Scyld Computing Corporation		http://www.scyld.com
> 914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster
> system
> Annapolis MD 21403			410-990-9993
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com



More information about the Scyld-users mailing list