[Beowulf] Users abusing screen

Fri Oct 21 09:10:36 PDT 2011

On 10/21/2011 11:24 AM, Reuti wrote:
> Hi,
> 
> Am 21.10.2011 um 15:10 schrieb Prentice Bisbal:
> 
>> Beowulfers,
>>
>> I have a question that isn't directly related to clusters, but I suspect
>> it's an issue many of you are dealing with are dealt with: users using
>> the screen command to stay logged in on systems and running long jobs
>> that they forget about. Have any of you experienced this, and how did
>> you deal with it?
>>
>> Here's my scenario:
>>
>> In addition to my cluster, we have a bunch of "computer servers" where
>> users can run the programs. These are "large" boxes with more cores
>> (24-32 cores) and more RAM (128 - 256 GB, ECC) than they'd have on a
>> desktop top.
>>
>> Periodically, when I have to shutdown/reboot a system for maintenance,
>> I find a LOT of shells being run through the screen command for users
>> who aren't logged in. The majority are idle shells, but many are running
>> jobs, that seem to be forgotten about. For example, I recently found
>> some jobs running since July or August that were running under the
>> account of someone who hasn't even been here for months!
>>
>> My opinion is these these are shared resources, and if you aren't
>> interactively using them, you should log out to free up resources for
>> others. If you have a job that can be run non-interactively, you should
>> submit it to the cluster.
>>
>> Has anyone else here dealt with the problem?
>>
>> I would like to remove screen from my environment entirely to prevent
>> this. My fellow sysadmins here agree. I'm expecting massive backlash
>> from the users.
> 
> I disallow rsh to the machines and limit ssh to admin staff. Users who want to run something on a machine have to go through the queuing system to get access to a node granted by GridEngine (for the startup method you can use either the -builtin- or [in case you need X11 forwarding] by a different sshd_config and ssh [GridEngine will start one daemon per task], one additional step is necessary for a tight integration of ssh).
> 
> For users just checking their jobs on a node I have a dedicated queue (where they can login always, but h_cpu limited to 60 seconds, i.e. they can't abuse it).
> 
> -- Reuti
> 

Reuti,

That was EXACTLY my original plan, but for reasons I don't want to get
into, I can't implement that. In fact, just yesterday I ripped out all
the SGE queues I had configured to that. Why? because I was tired of
seeing them and being reminded of what a good idea it was. :(

--
Prentice