<div dir="ltr">I agree with Chris D that this is more of a human problem than a technical problem. I have actually had a lot of success with user education -- people don't often think about the implications of having lots of people logged into the same head node, but get the idea when you explain it. Especially when you explain it along the lines of, "if we let all these other people test their MPI jobs on the head node, it would slow down YOUR work!"<div><br></div><div>Granted, people don't tend to read that explanation in the onboarding doc, and I often have to re-explain it when it comes up in practice. ;-) But in general I rarely see "repeat offenders", and when it happens removing access is the right policy.<br><div><br></div><div>We do ALSO enforce some per-user limits with cgroups (auto-generating the user-{UID}.slice as part of the user onboarding process). But in practice this mostly protects against accidental abuse ("whoops, I launched mpirun in the wrong terminal!"). The rare people who intentionally misuse the head node will find work-arounds.</div><div><br></div><div>Arbiter looks really interesting but I haven't had a chance to play with it yet. Need to bump that further up the priority list...</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 26, 2021 at 8:27 AM Prentice Bisbal via Beowulf <<a href="mailto:beowulf@beowulf.org">beowulf@beowulf.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Yes, there's a tool developed specifically for this called Arbiter that <br>
uses Linux cgroups to dynamically limit resources on a login node based <br>
on it's current load. It was developed at the University of Utah:<br>
<br>
<a href="https://dylngg.github.io/resources/arbiterTechPaper.pdf" rel="noreferrer" target="_blank">https://dylngg.github.io/resources/arbiterTechPaper.pdf</a><br>
<br>
<a href="https://gitlab.chpc.utah.edu/arbiter2/arbiter2" rel="noreferrer" target="_blank">https://gitlab.chpc.utah.edu/arbiter2/arbiter2</a><br>
<br>
Prentice<br>
<br>
On 3/26/21 9:56 AM, Michael Di Domenico wrote:<br>
> does anyone have a recipe for limiting the damage people can do on<br>
> login nodes on rhel7. i want to limit the allocatable cpu/mem per<br>
> user to some low value. that way if someone kicks off a program but<br>
> forgets to 'srun' it first, they get bound to a single core and don't<br>
> bump anyone else.<br>
><br>
> i've been poking around the net, but i can't find a solution, i don't<br>
> understand what's being recommended, and/or i'm implementing the<br>
> suggestions wrong. i haven't been able to get them working. the most<br>
> succinct answer i found is that per user cgroup controls have been<br>
> implemented in systemd v239/240, but since rhel7 is still on v219<br>
> that's not going to help. i also found some wonkiness that runs a<br>
> program after a user logs in and hacks at the cgroup files directly,<br>
> but i couldn't get that to work.<br>
><br>
> supposedly you can override the user-{UID}.slice unit file and jam in<br>
> the cgroup restrictions, but I have hundreds of users clearly that's<br>
> not maintainable<br>
><br>
> i'm sure others have already been down this road. any suggestions?<br>
> _______________________________________________<br>
> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
> To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote></div>