<html theme="default" iconset="color"><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head><body text="#000000">Honest advice ... aka my personal $.02 ... <br>

<br>

This is a problem that can't entirely be solved via technical means like

 resource constraints or cgroup controls. This is more of a training, 

knowledge transfer and acceptable use policy issue and fixing the 

problem has to include these elements. <br>

<br>

What I've learned over many years is that end-users looking to game the 

system will always have more time and more motivation to find evasive 

methods than IT and sysadmins have to catch and close the loopholes. <br>

<br>

I tend to recommend making "head node abuse" an employee behavior / 

management issue and I only do the bare minimum resource fencing on the 

head nodes and submission nodes to keep the nodes from being run into 

the ground. <br>

<br>

Process works like this:<br>

<br>

<div style="margin-left: 40px;">- If you want to use the Cluster you 

either take a short training course or if you are experienced you read 

and sign our HPC acceptable use policy that clearly explains what you 

can and cannot do on the head nodes, submit nodes and login nodes. We 

also point you to all our documentation and training resources<br>

  <br>

- The first 1-2 times you are "caught" abusing the head node we treat it

 as a simple training and knowledge transfer opportunity. No real 

repercussions and a good opportunity for IT to reach out and work 1:1 

with an end user to learn her/his requirements and workflow interests. 

99% of the time the head node abuse stops here. <br>

  <br>

- The third time you are caught abusing the head node your login access 

is terminated until you review the acceptable use policy and return a 

documented acknowledgement. Your manager is CC'd on these emails but no 

other repercussions<br>

  <br>

- The forth time you are caught we treat this as a non-trivial violation

 of organizational policies. HR is notified along with your management 

chain. Your cluster access is terminated until there is some sort of 

process and plan worked through with HR and the user's manager <br>

</div>

<br>

<br>

<br>

<br>

<br>

<span>


</span><br>

<blockquote type="cite" 

cite="mid:CABOsP2P=cB1LD_DMV0jyXmygse=cenm96Ze6PQzrfG1AusT9Sw@mail.gmail.com"

 style="border: 0px none ! important;">

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvHr" 

style="margin:30px 25px 10px 25px;"><div 

style="width:100%;border-top:2px solid 

rgba(146,154,163,0.7);padding-top:10px;">   <div 

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%;">

        <a style="color:#485664 

!important;padding-right:6px;font-weight:500;text-decoration:none 

!important;" href="mailto:mdidomenico4@gmail.com" moz-do-not-send="true">Michael

 Di Domenico</a></div>   <div 

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:

 right;">     <font color="#909AA4"><span style="padding-left:6px">March

 26, 2021 at 9:56 AM</span></font></div>    </div></div>

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvBody" 

__pbrmquotes="true" 

style="color:#909AA4;margin-left:24px;margin-right:24px;"><div>does 

anyone have a recipe for limiting the damage people can do on<br>login 

nodes on rhel7.  i want to limit the allocatable cpu/mem per<br>user to 

some low value.  that way if someone kicks off a program but<br>forgets 

to 'srun' it first, they get bound to a single core and don't<br>bump 

anyone else.<br><br>i've been poking around the net, but i can't find a 

solution, i don't<br>understand what's being recommended, and/or i'm 

implementing the<br>suggestions wrong.  i haven't been able to get them 

working.  the most<br>succinct answer i found is that per user cgroup 

controls have been<br>implemented in systemd v239/240, but since rhel7 

is still on v219<br>that's not going to help.  i also found some 

wonkiness that runs a<br>program after a user logs in and hacks at the 

cgroup files directly,<br>but i couldn't get that to work.<br><br>supposedly

 you can override the user-{UID}.slice unit file and jam in<br>the 

cgroup restrictions, but I have hundreds of users clearly that's<br>not 

maintainable<br><br>i'm sure others have already been down this road.  

any suggestions?<br>_______________________________________________<br>Beowulf

 mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>To 

change your subscription (digest mode or unsubscribe) visit 

<a class="moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br></div>


  </div>

</blockquote>

<br>

</body></html>