[scyld-users] bpsh in background: defunct processes
Anand Bedekar
anandbedekar at yahoo.com
Tue Sep 30 16:47:02 PDT 2003
Hello,
Thanks for the reply. Some more questions:
-- The cluster is running "Scyld Beowulf Basic Edition
27bz-8" (not RedHat 7: my mistake). Could you tell me
if the bug with processing status and termination
messages was fixed prior to or after this release?
-- Does this release have the beomap and beorun
dynamic scheduling functionality you described?
-- If the answer to either of the above is "no", is
there any alternative way (without upgrading) to make
sure that the processes don't go defunct, e.g. by
somehow sending a signal to the run.sh script called
by bpsh or something? Our sysadmin appears reluctant
to upgrade.
Thanks,
Anand
--- Donald Becker <becker at scyld.com> wrote:
> On Mon, 29 Sep 2003, Anand Bedekar wrote:
>
> > I'm trying to run bpsh in a script that calls bpsh
> in
> > a loop, like this:
> >
> > for i in 1 2 3
> > do
> > bpsh -n $i run.sh &
> > done
>
> Suggestion: you should be using 'beomap' to get a
> dynamic schedule:
>
> for i in `beomap --np 3`; do ...
>
> > What happens is that all the processes called
> within
> > run.sh seem to go into a "defunct" state without
> > finishing cleanly. This is making the process
> table
> > fill up, so that no more processes can be run.
>
> This sounds like a long-fixed bug in the BProc. The
> status and
> termination messages were being processed in reverse
> order.
>
> > Is this usual behaviour when calling bpsh to run a
> > shell script, given the way I am calling
> > 'bpsh -n $i run.sh &' ? Is there some other way to
> run
> > it?
>
> With our new release there is a command named
> 'beorun' that
> automatically combines a scheduler mapping with
> efficiently controlling
> the resulting processes:
> beorun --np 3 command;
>
> > Unfortunately all the nodes in the cluster are
> > currently out of action because the process table
> is
> > full on all of them, due to the above.
>
> You should be able to restart the cluster nodes in
> about a second...
>
> > So I can't report on which version of scyld has
> been installed,
> > until the sysadmin reboots the whole thing. I do
> know
> > the machines are P3 running RedHat 7.0, kernel
> version
> > 2.2.19.
>
> That doesn't sound like a Scyld release.
>
> --
> Donald Becker becker at scyld.com
> Scyld Computing Corporation http://www.scyld.com
> 914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster
> system
> Annapolis MD 21403 410-990-9993
>
__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com
More information about the Scyld-users
mailing list