[Beowulf] scheduler and perl

Clements, Brent M (SAIC) brent.clements at bp.com
Thu Aug 3 07:15:17 PDT 2006

In addition:

Depending on which version of LSF your running, you may want to take a
look at Job slot limits. These can be applied by a user, host, single
processor, and/or by queue.

You also may want to look at hiring a consultant to come in and help you
design and implement Cluster Scheduling Policies.



-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
On Behalf Of Clements, Brent M (SAIC)
Sent: Thursday, August 03, 2006 9:03 AM
To: Xu, Jerry; Chris Dagdigian; beowulf at beowulf.org
Subject: RE: [Beowulf] scheduler and perl

If I recall from my LSF days, you can limit the number of jobs that a
user can run at one time based upon queue policy. 
This is also the case with MAUI/Moab and some other policy-based job


-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
On Behalf Of Xu, Jerry
Sent: Wednesday, August 02, 2006 8:38 AM
To: Chris Dagdigian; beowulf at beowulf.org
Subject: RE: [Beowulf] scheduler and perl

Hi, Dear Joe, Chris:
  Thanks so much for your warm-hearted discussion. I used to manage
cluster which is used by much "nicer" MPI application developers, who
know exactly what they are doing and submitting fewer jobs but
collecting most of nodes and running MPI stuff.
  Now, I am facing lots user who is basically running bunch of serial
jobs, knowing little bit perl and shell, and figure out how powerful the
loop is, then begin to "bomb" the cluster :-)

  We use LSF, I guess it will be okay to support thousands jobs, just
not used to it, I think.



-----Original Message-----
From: Chris Dagdigian [mailto:dag at sonsorol.org]
Sent: Tuesday, August 01, 2006 9:23 PM
To: Xu, Jerry; beowulf at beowulf.org
Subject: Re: [Beowulf] scheduler and perl

As Joe mention, the way we handle this is by using cluster schedulers
sitting on robust hardware platforms that are capable of handling large
numbers of job submissions without problems. Grid Engine and Platform
LSF are two capable products that come to mind and scale well.

The fact that your users are using "qsub" is a good thing that you
certainly want to encourage. It puts their job under the control of a
scheduler and allows you to do policy based allocation of your computing

The alternative is your users bypassing the scheduler altogether by
SSH'ing to a node and just manually starting programs. Attempts to
bypass the scheduler are common in some environments so consider
yourself lucky that your users are using the scheduler at all!

The problem with specific users or perl loops bringing the system down
with a giant load of rapid qsub submissions is usually best handled on a
per-user or per-use case level.

Its more of a matter of education and making sure your users have a
resource who can help them with their job scripts and the general tasks
of cluster application integration. Your users are not intentionally
trying to cause problems on the system (most likely) but it appears
clear that they may need some assistance on how to better use the
existing cluster.

Not giving users sufficient application integration and cluster
scripting support resources is a problem I see all the time. Too many
cluster operators think that training users on a few scheduler
submission and status commands is all the integration help that they
need to provide.  The end result is someone writing a shell or perl
script that tries to submit a few million short running tasks all at
once ...

Ways you can deal with the situation:

- Examine the user scripts, see if their script can be altered to put
"more work" into each individual qsub job submission. This will reduce
the number of qsub commands required

- Tell your users that the use of rapid loops for job submission is
causing system problems. Work with them to introduce a small delay into
their submissions. It is to everyone's best interest not to bring down
the master scheduler

- Look into a feature that some scheduling systems call "array jobs"  
or "job arrays" -- For schedulers that support this feature it is a very
very powerful way to use a single qsub/bsub command to launch hundreds
of thousands of jobs. I know that a SGE design goal is to support the
submission of a single job array with up to 500,000  
individual sub tasks.  Both SGE and LSF do job arrays very well.   
This feature only works well if the workflow includes similar commands
that vary only slightly (like the input file or a command line argument
for instance).

So in summary:

  - Be happy users are issuing qsub commands at all !
  - Treat the looping problem as a sign that your users may need some
application integration assistance/education
  - Work with the users that are causing problems, see if they can
introduce a delay
  - Look into "array  job" functionality

Regarding the problem of people bypassing the scheduler and logging into
nodes directly via SSH to run tasks -- I've posted on this exact topic
on this list before, you may be able to find it in an archive somewhere.
In short, my belief is that you'll never win the technological "arms
race" with the users when you try to block users who are bypassing the

Depending on your organizational environment, it is better to treat the
problem of users bypassing the scheduler as a Management/HR/ Policy
problem rather than a technological problem.  Set up a good scheduler
with resource allocation policies that have been accepted by the users.
Then make a policy that everyone who wants to use shared resources must
operate under the scheduler. After that, make sure that people are
informed that scheduler/cluster abuse is a policy matter that will be
referred up the management chain and eventually to the human resources
department.  It's a matter of policy and acceptable use, not technology.

My $.02


On Aug 1, 2006, at 5:36 PM, Xu, Jerry wrote:

> Hi, Thanks, Joe.
>  I am not meaning to "ban" anything immediately, I am just curious how

> often this happen to the HPC community.
> Perl/shell is really strong tool, one example is to use loop to submit

> huge mount of jobs and puts burden on scheduler server, the other 
> example is to have one job sit idle and frequently to use system call 
> to detect the job status and resubmit jobs again and again; the other 
> example is that use system call and ssh to each node and run stuff and

> bypass the scheduler... It just drives me crazy sometime.
>  How do you guys handle issue like this?

Beowulf mailing list, Beowulf at beowulf.org To change your subscription
(digest mode or unsubscribe) visit

Beowulf mailing list, Beowulf at beowulf.org To change your subscription
(digest mode or unsubscribe) visit

More information about the Beowulf mailing list