[Beowulf] question about enforcement of scheduler use

Tue May 23 07:18:45 PDT 2006

Larry,

The prologue script only runs on the mother superior node, so it can
only alter the other nodes in the job via (probably) ssh.  I think Dr.
Weisz's script does this, although the version I've seen has "rsh"
hard-coded.  I'd check to see that the prologue script is actually
altering limits.conf on all of the nodes, since it looks like that could
be why you're seeing connection failures.

I think the way you're going about this is fine; we've tried the same
thing here at IU.  In the end, we just didn't have that many problem
users connecting directly to the compute nodes, so we abandoned the
restriction enforcement.  Our problems have more to do with orphaned MPI
processes hanging around on nodes, so we use a script to periodically
clean out processes owned by users who shouldn't be on the node.

As far as a manual is concerned, all I can suggest is the PBSPro admin's
guide, but that's more for prologue/epilogue specifics, not techniques
for restricting access.  Another method for restricting access, assuming
users only need multiple nodes for MPI jobs, is to disable ssh access
entirely to the compute nodes, and require mpiexec (or LAM/MPI, OpenMPI
or some other MPI that's aware of the PBS tm API).  This allows
processes on the  job nodes to be spawned directly from the pbs_mom,
rather than the mother superior node's pbs_mom spawning an mpirun
process that then uses ssh to start the multiple MPI processes.  This
requires some user education, though, and there will almost always be
cases where it's not going to work, such as users who require commercial
MPI distributions.

Matt

Larry Felton Johnson wrote:
> My apologies in advance if this is a FAQ, but I'm reading through the
> documentation and tinkering with the problem below simultaneously, and
> would appreciate  help at least focussing the problem and avoiding
> going down useless paths (given my relative inexperience with clusters).
> 
> I'm primarily a solaris sysadmin (and a somewhat specialized one at
> that).  I've been given the task of administering a cluster (40 nodes
> + head) put together by atipa, and have been scrambling to come up to
> speed on Linux on the one hand and the cluster-specific software and
> config files on the other.
> 
> I was asked by the folks in charge of working with the end users to
> help migrate to enforcement of the use of a scheduler (in our case
> PBSpro).  In preparation for this I was asked to isolate four nodes
> and make those nodes only accessable to end users via PBSpro.
> 
> The most promising means I found in my searches was the one used
> by Dr. Weisz, of modifying the PAM environment, limits.conf, and the
> PBS prologue and epilogue files.  I found his document describing the
> approach, but have not found his original prologue and epilogue scripts.
> 
> However, I wrote prologue and epilogue scripts that did what he decribed
> (wrote a line of the form "${USER}   hard maxlogins 18  #${JOB_ID}"
> to the limits.conf file on the target node, and erased it after the job was 
> completed).
> 
> If we limit the job to one node the prologue and epilogue scripts run
> with the intended effect.  The problem is when we put the other three
> target nodes in  play, we get a failure on three of the nodes, which is I 
> suspect due to an attempt by the application to communicate via ssh under 
> the user's id laterally from node to node. 
> 
> PBS hands the job off to node037 which sucessfully runs it's prologue
> file.
> 
> Here's the contents of the output file:
> 
> Starting 116.head Thu May 18 15:10:48 CDT 2006
> Initiated on node037
> 
> Running on 4 processors: node037 node038 node039 node040
> 
> 
> Here's the error file:
> 
> Connection to node038 closed by remote host.
> Connection to node039 closed by remote host.
> Connection to node040 closed by remote host.
> =>> PBS: job killed: walltime 159 exceeded limit 120
> 
> 
> To clean up my question a bit I'll break it into four chunks:
> 
> 1) Is the general approach I'm using appropriate for my intended effect 
>    (isolating four nodes and enforcing the use of the pbspro scheduler
>    on those nodes)?
> 
> 2) If so what's the best way of allowing node-to-node communication, if
>    indeed that's my likely problem?
> 
> 3) If not does anyone have any other strategies for achieving what I'm
>    after?
> 
> 4) If the answer is RTFM could someone steer me towards the FMs or parts
>    thereof I need to be perusing :-)
> 
> Thanks in advance.
> 
> Larry
> 
> Larry

-- 
Matt Allen            |  Systems Analyst
malallen at indiana.edu  |  Research and Technical Services
812-855-7318          |  Indiana University

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20060523/25610fe4/attachment.sig>