Parallel batch jobs on beowulf?

Gary Stiehr gary at umsl.edu
Tue Oct 2 07:45:08 PDT 2001


Hi...

Eray Ozkural (exa) wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi Gary,


...

> 
> On Tuesday 02 October 2001 01:34 am, Gary Stiehr wrote:
> I have one question though. How would I prevent users from starting parallel 
> jobs while a PBS job is running? Since parallel codes don't run in a flash, 
> it's highly likely that another user might, unknowingly, interfere with a 
> parallel job. (and perturb its wall-clock performance) Is there a nice way to 
> 'lock' a node when a parallel job starts on a node and 'release' it when it 
> terminates so that no other user process can be started on it while the 
> parallel job is running?
> 
> Since all nodes are accessed via rsh it could be on that level, or at a lower 
> level I guess, but I'm uncertain as to how this should be implemented in a 
> reliable way (so that it does not corrupt otherwise normal operations)
> 
> 



Without some sort of middleware, there is no way that I know of to stop 
this.  We just establish a policy that specifies that all users submit 
their jobs via PBS.  *IF* everyone actually submits their jobs via PBS 
and you have configured your nodes as "exclusive" nodes, you will 
achieve this effect of locking and releasing nodes.  The challenge will 
be to actually be able to interface all of your users' programs with PBS 
(actually, it may be easy depending on how all of the programs are 
written (with MPI, PVM, etc.)).  Once you start using PBS, you will see 
what I mean (feel free to ask me questions about this and/or subscribe 
to the pbs-users mailing list).


>>    You can also set PBS to use your nodes in a "time-shared" manner.
>>This way, PBS can allocate more than one job to each node.  When you
>>need to have only one job per node, you can configure the nodes as
>>"exclusive" nodes.  I hope this helps.
>>
> 
> It seems I'd use exclusive nodes. Could I also allocate them for a specific 
> time interval? For instance, having batch jobs run in the night... But I 
> don't know if that would be a good sol'n.
> 


You can do all sorts of things with PBS, including scheduling times when 
the jobs should start (of course the appropriate number of nodes will 
have to be free at that time).  You can also impose resource limits, 
such as the amount of CPU time, that a jobs uses.    As Justin Moore's 
reply to this thread suggested, having seperate PBS queues would be a 
solution, too.  I would really suggest just looking through the PBS 
documentation to find out all of the things you can do with PBS.

--
Gary Stiehr
gary at umsl.edu

> Thanks,
> 
> - -- 
> Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
> Comp. Sci. Dept., Bilkent University, Ankara
> www: http://www.cs.bilkent.edu.tr/~erayo
> GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.6 (GNU/Linux)
> Comment: For info see http://www.gnupg.org
> 
> iD8DBQE7uW8IfAeuFodNU5wRAlutAJ9Wgh9MZQ3hrHdXk1YXp3X/cg+0mwCfflp3
> sBKq6HiRlooC/8AsjoLWLt4=
> =H78k
> -----END PGP SIGNATURE-----
> 







More information about the Beowulf mailing list