[Beowulf] [External] head node abuse

Rémy Dernat remy.dernat at umontpellier.fr
Mon Mar 29 07:19:51 UTC 2021


Hi,

IMHO, this PAM solution is a very neat solution. It only lacks a network 
limitation (maybe just add a [traffic]shaper solution, like tc ?).

Best regards

Le 26/03/2021 à 17:30, Lohit Valleru via Beowulf a écrit :
> I have just used a simple PAM script to apply cgroup rules to every 
> user who logs into a CentOS7 login node
> Something like this:
>
> #!/bin/sh -e
>
> PAM_UID=$(getent passwd "${PAM_USER}" | cut -d: -f3)
>
> if [ "${PAM_UID}" -ge 1000 ]; then
>     /bin/systemctl set-property "user-${PAM_UID}.slice" \
>                    CPUQuota=100% MemoryLimit=2G
> fi
>
> This is not as sophisticated or does not change parameters depending 
> on dynamic load, But it does set static limits for every user as per 
> cgroups.
>
> However, the above does not cover every scenario, and does not 
> restrict the number of threads, network load, network file system load 
> ( NFS/GPFS/Lustre). or paging etc.
> I have actually seen cases where cgroups were causing more stress 
> trying to limit resources such as memory for users, who happen to run 
> hundreds of threads and still be able to stay within the memory/cpu 
> limit. It so happens that Cgroup does not kill every application that 
> goes beyond limits, as long as the application tries to stay within 
> its limits.
> I tried limiting the number of threads with cgroups, and it caused 
> issues where it kills ssh connections when threads go beyond a limit.
> Also, I recently realized about how Java does not recognize cgroups 
> for its garbage collection, and instead assumes that all of physical 
> memory is available.
>
> I do not know if Arbiter somehow resolved the above issues, and 
> behaves much better than simple cgroup limits, or if Redhat 8 happens 
> to be better.
>
> I do want to mention that for an ideal solution - i go with Chris 
> Dagdigian response, that it is best to educate users and follow up 
> respectively.
>
> At the same time, I do wish there was a good solution. I also thought 
> about cases, where i could write an ssh wrapper with bsub/qsub 
> interactive job command that will allow users to use compute nodes as 
> interactive nodes for a while, to compile/edit or submit there scripts 
> but this would only be easy if all the compute nodes can be directly 
> reachable over network, and not be restricted on a private network.
>
> Thank you,
> Lohit
>
> On Fri, Mar 26, 2021 at 10:27 AM Prentice Bisbal via Beowulf 
> <beowulf at beowulf.org <mailto:beowulf at beowulf.org>> wrote:
>
>     Yes, there's a tool developed specifically for this called Arbiter
>     that
>     uses Linux cgroups to dynamically limit resources on a login node
>     based
>     on it's current load. It was developed at the University of Utah:
>
>     https://dylngg.github.io/resources/arbiterTechPaper.pdf
>     <https://dylngg.github.io/resources/arbiterTechPaper.pdf>
>
>     https://gitlab.chpc.utah.edu/arbiter2/arbiter2
>     <https://gitlab.chpc.utah.edu/arbiter2/arbiter2>
>
>     Prentice
>
>     On 3/26/21 9:56 AM, Michael Di Domenico wrote:
>     > does anyone have a recipe for limiting the damage people can do on
>     > login nodes on rhel7.  i want to limit the allocatable cpu/mem per
>     > user to some low value.  that way if someone kicks off a program but
>     > forgets to 'srun' it first, they get bound to a single core and
>     don't
>     > bump anyone else.
>     >
>     > i've been poking around the net, but i can't find a solution, i
>     don't
>     > understand what's being recommended, and/or i'm implementing the
>     > suggestions wrong.  i haven't been able to get them working. 
>     the most
>     > succinct answer i found is that per user cgroup controls have been
>     > implemented in systemd v239/240, but since rhel7 is still on v219
>     > that's not going to help.  i also found some wonkiness that runs a
>     > program after a user logs in and hacks at the cgroup files directly,
>     > but i couldn't get that to work.
>     >
>     > supposedly you can override the user-{UID}.slice unit file and
>     jam in
>     > the cgroup restrictions, but I have hundreds of users clearly that's
>     > not maintainable
>     >
>     > i'm sure others have already been down this road.  any suggestions?
>     > _______________________________________________
>     > Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     > To change your subscription (digest mode or unsubscribe) visit
>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

-- 
Rémy Dernat
Chef de projet SI
IR CNRS - ISI / ISEM

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210329/00bb9335/attachment.htm>


More information about the Beowulf mailing list