[Beowulf] Determining PID of a multi-processor PBS job ; mpi cleanup of tasks

Rayson Ho rayson2 at eseenet.com
Tue Jul 6 17:45:59 PDT 2004


You can either use:

1) mpiexec
It uses the TM interface to control the MPI jobs, and you can easily add
code to cleanup the shared memory segments and semaphores...

2) Use this:
http://bellatrix.pcl.ox.ac.uk/~ben/pbs/mpicleanup.c
"Cleanup of MPICH/PBS jobs"
http://bellatrix.pcl.ox.ac.uk/~ben/pbs/

Rayson

>I was trying to write a script that cleans up orphaned
>mpi tasks, their associated shared memory segments and
>semaphores of a particular mpi job submitted via PBS
>(actually openpbs).  
>
>For this I need to determine the PID associated with
>the particular PBS MPI job.  Lets say the MPI job runs
>on 5 processors, I am able to parse through the
>several <proc_id> in the /proc directory on the 1st
>processor, search for the variable PBS_JOBNAME in the
>"/proc/<proc_id>/environ" file and determine the
>corresponding parent id of the pbs job.  But, the same
>trick does not work on the other processors.  
>
>Question : Is there any easy way of determining all
>the PPIDs on different hosts corresponding to a
>multi-processor PBS MPI job ? 
>
>Help ::: If someone can actually mail me a mpi-cleanup
>script (and install instructions) that cleans all
>orphaned mpi processors nad their associated shm and
>semaphores, it will save me a lot of trouble writing
>the script.  And I will be indebted to you for life.
---------------------------------------------------------
Get your FREE E-mail account at http://www.eseenet.com !



More information about the Beowulf mailing list