[Beowulf] [External] head node abuse
Rémy Dernat
remy.dernat at umontpellier.fr
Mon Mar 29 07:19:51 UTC 2021
Hi,
IMHO, this PAM solution is a very neat solution. It only lacks a network
limitation (maybe just add a [traffic]shaper solution, like tc ?).
Best regards
Le 26/03/2021 à 17:30, Lohit Valleru via Beowulf a écrit :
> I have just used a simple PAM script to apply cgroup rules to every
> user who logs into a CentOS7 login node
> Something like this:
>
> #!/bin/sh -e
>
> PAM_UID=$(getent passwd "${PAM_USER}" | cut -d: -f3)
>
> if [ "${PAM_UID}" -ge 1000 ]; then
> /bin/systemctl set-property "user-${PAM_UID}.slice" \
> CPUQuota=100% MemoryLimit=2G
> fi
>
> This is not as sophisticated or does not change parameters depending
> on dynamic load, But it does set static limits for every user as per
> cgroups.
>
> However, the above does not cover every scenario, and does not
> restrict the number of threads, network load, network file system load
> ( NFS/GPFS/Lustre). or paging etc.
> I have actually seen cases where cgroups were causing more stress
> trying to limit resources such as memory for users, who happen to run
> hundreds of threads and still be able to stay within the memory/cpu
> limit. It so happens that Cgroup does not kill every application that
> goes beyond limits, as long as the application tries to stay within
> its limits.
> I tried limiting the number of threads with cgroups, and it caused
> issues where it kills ssh connections when threads go beyond a limit.
> Also, I recently realized about how Java does not recognize cgroups
> for its garbage collection, and instead assumes that all of physical
> memory is available.
>
> I do not know if Arbiter somehow resolved the above issues, and
> behaves much better than simple cgroup limits, or if Redhat 8 happens
> to be better.
>
> I do want to mention that for an ideal solution - i go with Chris
> Dagdigian response, that it is best to educate users and follow up
> respectively.
>
> At the same time, I do wish there was a good solution. I also thought
> about cases, where i could write an ssh wrapper with bsub/qsub
> interactive job command that will allow users to use compute nodes as
> interactive nodes for a while, to compile/edit or submit there scripts
> but this would only be easy if all the compute nodes can be directly
> reachable over network, and not be restricted on a private network.
>
> Thank you,
> Lohit
>
> On Fri, Mar 26, 2021 at 10:27 AM Prentice Bisbal via Beowulf
> <beowulf at beowulf.org <mailto:beowulf at beowulf.org>> wrote:
>
> Yes, there's a tool developed specifically for this called Arbiter
> that
> uses Linux cgroups to dynamically limit resources on a login node
> based
> on it's current load. It was developed at the University of Utah:
>
> https://dylngg.github.io/resources/arbiterTechPaper.pdf
> <https://dylngg.github.io/resources/arbiterTechPaper.pdf>
>
> https://gitlab.chpc.utah.edu/arbiter2/arbiter2
> <https://gitlab.chpc.utah.edu/arbiter2/arbiter2>
>
> Prentice
>
> On 3/26/21 9:56 AM, Michael Di Domenico wrote:
> > does anyone have a recipe for limiting the damage people can do on
> > login nodes on rhel7. i want to limit the allocatable cpu/mem per
> > user to some low value. that way if someone kicks off a program but
> > forgets to 'srun' it first, they get bound to a single core and
> don't
> > bump anyone else.
> >
> > i've been poking around the net, but i can't find a solution, i
> don't
> > understand what's being recommended, and/or i'm implementing the
> > suggestions wrong. i haven't been able to get them working.
> the most
> > succinct answer i found is that per user cgroup controls have been
> > implemented in systemd v239/240, but since rhel7 is still on v219
> > that's not going to help. i also found some wonkiness that runs a
> > program after a user logs in and hacks at the cgroup files directly,
> > but i couldn't get that to work.
> >
> > supposedly you can override the user-{UID}.slice unit file and
> jam in
> > the cgroup restrictions, but I have hundreds of users clearly that's
> > not maintainable
> >
> > i'm sure others have already been down this road. any suggestions?
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
--
Rémy Dernat
Chef de projet SI
IR CNRS - ISI / ISEM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210329/00bb9335/attachment.htm>
More information about the Beowulf
mailing list