[Beowulf] Users abusing screen

Steve Crusan scrusan at ur.rochester.edu
Wed Oct 26 14:14:13 PDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 26, 2011, at 4:55 PM, Mark Hahn wrote:

>> sometime, and I've never seen a comment like yours before. You're out of
>> line.
> 
> hah.  Greg doesn't post all that much, but he's no stranger to the flame ;)
> 
> seriously, your question seemed to be about a general problem,
> but your motive, ulterior or not, seemed to be to get rid of screen.
> 
> IMO, getting rid of screen is BOFHishness of the first order.
> it's a tool that has valuable uses.  it's not the cause of your problem.


I agree. 

- From reading this thread, the original machine(s) in question seem to be some sort of interactive or login node(s). If these nodes were large memory or SMP machines, we'd have our resource manager take care of long running processes or other abuses. 


> 
> on our login nodes, we have some basic limits (/etc/security/limit.conf)
> that prevent large or long processes or numerous processes.
> 
> * hard as 3000000
> * hard cpu 60
> * hard nproc 100
> * hard maxlogins 20
> 
> these are very arguable, and actually pretty loose.  our login nodes are
> intended for editing/compiling/submitting, maybe the occasional gnuplot/etc.
> there doesn't seem to be much resistance to the 3G as (vsz) limit, and 
> it does definitely cut down on OOM problems.  60 cpu-minutes covers any
> possible compile/etc (though it has caused problems with people trying to
> do very large scp operations.)  nproc could probably be much lower (20?)
> and maxlogins ought to be more like 5.


We actually just spinned up a graphical login node for our less saavy users whom are more apt to run matlab, comsol, gnuplot, and other 'EZ button' graphically based scientific software. This graphical login software (http://code.google.com/p/neatx/) has helped us a lot with novice users. It has session resumption, client software for any platforms, it's faster than xforwarding, and it's wrapped around SSH. 

The node itself is 'fairly' heavy (8 procs, 72GB of RAM), but we've implemented cgroups to stop abuses. Upon login (through SSH or NX) each user is added to his own control group, which has processor and memory limits. Since the user's processes are kept inside of control group process spaces, it's easy to work directly with their processes/process trees, whether it be dynamic throttling, or just killing processes.   

 On our login nodes that don't use control groups, we just kill any heavy computational processes after a certain period of time, depending on whether or not it's a compilation step, gzip, etc. We state this in our documentation, and usually give the user a warning+grace period. We don't see this type of abuse anymore because the few users whom have done this quickly learned (and apologized, imagine that!), or they were using our cgroup setup login node, so their abuse didn't affect the system enough.

 If the issue is processes that run for far too long, and are abusing the system, cgroups or 'pushing' the users to use a batch system seems to work better than writing scripts to make decisions on killing processes. Most ISVs have methods to run computation in batch mode, so it's not necessary for matlab type users to have their applications running for 3 weeks in a screen session when they could be using the cluster.

Either that, or using some sort of cpu/memory limits that were listed above, or cgroups. So a process can run forever, but it won't have enough CPU/memory shares to make a difference.

Just my .02

> 
> we don't currently have an idle-process killer, though have thought of it.
> we only recently put a default TMOUT in place to cause a bit of gc on 
> forgotten login sessions.
> 
> we do have screen installed (I never use it myself.)
> 
> regards, mark hahn.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

 ----------------------
 Steve Crusan
 System Administrator
 Center for Research Computing
 University of Rochester
 https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJOqHgzAAoJENS19LGOpgqKDHQH/AqfAefrt3nusElS/OBnxgBK
Pf8tFuyjoJvLgt+3KX19ZL18r1b/BhdW3/1GZgSVVjQZcYkV6dtUq6VI545jqDag
lRY9kvyIhudKfVhFwGa87DbXSzYv5oDImf3UejsIiJvo20Bzxf7mdpToT+AGJ4gA
J2HzrZwjdZk/DYEJ7CpG9lfthDDq5mrTQTbzVCnFHvEiWpeoBvfd3gJOP94age0F
0ZQGLCgheRSJXLsOlq0y0vqr+7nzupSrLUk5A1YcUysSpk4Dc4mvUVJFE+QbStN6
dSiYHhKMxF5qJTXYOSAF4QDmIObyzlbFFmHCeTTWrCG7KeWtOZU4zUfN7TL3sO4=
=M5Pw
-----END PGP SIGNATURE-----



More information about the Beowulf mailing list