[Beowulf] question about enforcement of scheduler use
Chris Dagdigian
dag at sonsorol.org
Mon May 22 14:15:24 PDT 2006
Sorry for not having specifics related to PBS, I'm usually using Grid
Engine or LSF for this type of work.
I can give you one piece of advice which I've learned the hard way
and have tested in several different deployments ...
In short, technical fixes or "sysadmin" approaches to mandating the
use of a scheduler will never work in the long run. All you do is end
up kicking off a technological arms race with your more savvy users.
An upset user looking to game the system is always going to have far
more time and motivation than an overworked cluster admin so
generally it becomes a losing battle.
I've repeatedly found that is is far better in the long run to make
the scheduler system (and proper use of it) a policy matter. Clear
acceptable use policies need to be drafted with user input and
clearly communicated to everyone. After that, users who attempt to
bypass or game the system are referred to their manager. A 2nd
attempt to bypass the system gets reported up higher and a third
attempt results in the loss of cluster login access and a possible
referral to the HR department.
That said though, I work in commercial environments where scheduler
policies are in place to enforce fairshare-by-user or are used to
prioritize cluster resources according to very specific business,
scientific or research goals. In those settings it is very easy to
point out costs of dealing with users who repeatedly bypass the system.
Going back to the technical side .. One trick that I've seen done
with grid engine takes advantage of the fact that all Grid Engine
launched cluster tasks are all going to be a child process of a
sge_shepherd daemon. I've seen clusters where there was a recurring
cron script that would search out and "kill -9" any user process that
was not a child of a sge_shepherd. The end result was that nobody
could run a job on a node unless it was under the control of the
scheduler. If PBS has the same sort of setup and you could discern
between pbs-managed jobs and non-pbs-managed tasks then a similar
approach could be taken.
My $.02
-Chris
On May 22, 2006, at 8:45 AM, Larry Felton Johnson wrote:
>
> My apologies in advance if this is a FAQ, but I'm reading through the
> documentation and tinkering with the problem below simultaneously, and
> would appreciate help at least focussing the problem and avoiding
> going down useless paths (given my relative inexperience with
> clusters).
>
> I'm primarily a solaris sysadmin (and a somewhat specialized one at
> that). I've been given the task of administering a cluster (40 nodes
> + head) put together by atipa, and have been scrambling to come up to
> speed on Linux on the one hand and the cluster-specific software and
> config files on the other.
>
> I was asked by the folks in charge of working with the end users to
> help migrate to enforcement of the use of a scheduler (in our case
> PBSpro). In preparation for this I was asked to isolate four nodes
> and make those nodes only accessable to end users via PBSpro.
>
> The most promising means I found in my searches was the one used
> by Dr. Weisz, of modifying the PAM environment, limits.conf, and the
> PBS prologue and epilogue files. I found his document describing the
> approach, but have not found his original prologue and epilogue
> scripts.
>
> However, I wrote prologue and epilogue scripts that did what he
> decribed
> (wrote a line of the form "${USER} hard maxlogins 18 #${JOB_ID}"
> to the limits.conf file on the target node, and erased it after the
> job was
> completed).
>
> If we limit the job to one node the prologue and epilogue scripts run
> with the intended effect. The problem is when we put the other three
> target nodes in play, we get a failure on three of the nodes,
> which is I
> suspect due to an attempt by the application to communicate via ssh
> under
> the user's id laterally from node to node.
>
> PBS hands the job off to node037 which sucessfully runs it's prologue
> file.
>
> Here's the contents of the output file:
>
> Starting 116.head Thu May 18 15:10:48 CDT 2006
> Initiated on node037
>
> Running on 4 processors: node037 node038 node039 node040
>
>
> Here's the error file:
>
> Connection to node038 closed by remote host.
> Connection to node039 closed by remote host.
> Connection to node040 closed by remote host.
> =>> PBS: job killed: walltime 159 exceeded limit 120
>
>
> To clean up my question a bit I'll break it into four chunks:
>
> 1) Is the general approach I'm using appropriate for my intended
> effect
> (isolating four nodes and enforcing the use of the pbspro scheduler
> on those nodes)?
>
> 2) If so what's the best way of allowing node-to-node
> communication, if
> indeed that's my likely problem?
>
> 3) If not does anyone have any other strategies for achieving what I'm
> after?
>
> 4) If the answer is RTFM could someone steer me towards the FMs or
> parts
> thereof I need to be perusing :-)
>
> Thanks in advance.
>
> Larry
>
> Larry
> --
> ========================================================
> "I learned long ago, never to wrestle with a pig. You
> get dirty, and besides, the pig likes it."
>
> George Bernard Shaw
> ========================================================
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list