<html theme="default" iconset="color"><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head><body text="#000000">Honest advice ... aka my personal $.02 ... <br>
<br>
This is a problem that can't entirely be solved via technical means like
resource constraints or cgroup controls. This is more of a training,
knowledge transfer and acceptable use policy issue and fixing the
problem has to include these elements. <br>
<br>
What I've learned over many years is that end-users looking to game the
system will always have more time and more motivation to find evasive
methods than IT and sysadmins have to catch and close the loopholes. <br>
<br>
I tend to recommend making "head node abuse" an employee behavior /
management issue and I only do the bare minimum resource fencing on the
head nodes and submission nodes to keep the nodes from being run into
the ground. <br>
<br>
Process works like this:<br>
<br>
<div style="margin-left: 40px;">- If you want to use the Cluster you
either take a short training course or if you are experienced you read
and sign our HPC acceptable use policy that clearly explains what you
can and cannot do on the head nodes, submit nodes and login nodes. We
also point you to all our documentation and training resources<br>
<br>
- The first 1-2 times you are "caught" abusing the head node we treat it
as a simple training and knowledge transfer opportunity. No real
repercussions and a good opportunity for IT to reach out and work 1:1
with an end user to learn her/his requirements and workflow interests.
99% of the time the head node abuse stops here. <br>
<br>
- The third time you are caught abusing the head node your login access
is terminated until you review the acceptable use policy and return a
documented acknowledgement. Your manager is CC'd on these emails but no
other repercussions<br>
<br>
- The forth time you are caught we treat this as a non-trivial violation
of organizational policies. HR is notified along with your management
chain. Your cluster access is terminated until there is some sort of
process and plan worked through with HR and the user's manager <br>
</div>
<br>
<br>
<br>
<br>
<br>
<span>
</span><br>
<blockquote type="cite"
cite="mid:CABOsP2P=cB1LD_DMV0jyXmygse=cenm96Ze6PQzrfG1AusT9Sw@mail.gmail.com"
style="border: 0px none ! important;">
<div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvHr"
style="margin:30px 25px 10px 25px;"><div
style="width:100%;border-top:2px solid
rgba(146,154,163,0.7);padding-top:10px;"> <div
style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%;">
<a style="color:#485664
!important;padding-right:6px;font-weight:500;text-decoration:none
!important;" href="mailto:mdidomenico4@gmail.com" moz-do-not-send="true">Michael
Di Domenico</a></div> <div
style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:
right;"> <font color="#909AA4"><span style="padding-left:6px">March
26, 2021 at 9:56 AM</span></font></div> </div></div>
<div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvBody"
__pbrmquotes="true"
style="color:#909AA4;margin-left:24px;margin-right:24px;"><div>does
anyone have a recipe for limiting the damage people can do on<br>login
nodes on rhel7. i want to limit the allocatable cpu/mem per<br>user to
some low value. that way if someone kicks off a program but<br>forgets
to 'srun' it first, they get bound to a single core and don't<br>bump
anyone else.<br><br>i've been poking around the net, but i can't find a
solution, i don't<br>understand what's being recommended, and/or i'm
implementing the<br>suggestions wrong. i haven't been able to get them
working. the most<br>succinct answer i found is that per user cgroup
controls have been<br>implemented in systemd v239/240, but since rhel7
is still on v219<br>that's not going to help. i also found some
wonkiness that runs a<br>program after a user logs in and hacks at the
cgroup files directly,<br>but i couldn't get that to work.<br><br>supposedly
you can override the user-{UID}.slice unit file and jam in<br>the
cgroup restrictions, but I have hundreds of users clearly that's<br>not
maintainable<br><br>i'm sure others have already been down this road.
any suggestions?<br>_______________________________________________<br>Beowulf
mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>To
change your subscription (digest mode or unsubscribe) visit
<a class="moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br></div>
</div>
</blockquote>
<br>
</body></html>