preemption in PBSPro with MPICH on Linux
Gary Stiehr
gary at umsl.edu
Thu Apr 25 11:48:24 PDT 2002
Hi,
Does anyone have experience using the preemption feature of PBSPro with
MPICH jobs on Linux clusters? I believe in the release notes for PBSPro
5.2, it says that it will only send SIGSTOP to the process that PBS
started (i.e, the mpirun process). Therefore, if that process started
other processes (as is the case with my MPICH job), the other processes
will continue to run. Does anyone know of a way to suspend all
processes started from an MPICH job?
I need to do this because some MPICH jobs last several weeks and other
smaller jobs submitted would have to wait if I do not use preemption. I
suppose another method would be to make sure that the long MPICH job
checkpoints and then just have PBS kill the job after a certain amount
of time.
Any experiences and/or suggestions would be appreciated.
Thanks,
Gary Stiehr
Information Technology Services
University of Missouri - St. Louis
gary at umsl.edu
More information about the Beowulf
mailing list