[Beowulf] first cluster

Chris Dagdigian dag at sonsorol.org
Fri Jul 16 10:01:01 PDT 2010

You want the honest answer?

There are technical things you can do to to prevent users from bypassing 
the scheduler and resource allocation policies. One of the cooler things 
I've seen in Grid Engine environments was a cron job that did a "kill 
-9" against any user process that was not a child of a sge_shepherd 
daemon. Very effective.

Other people play games with pam settings and the like.

The honest truth is that technical countermeasures are mostly a waste of 
time. A motivated user always has more time and effort to spend trying 
to game the system than an overworked administrator.

My recommendation is to subject users to a cluster acceptable use 
policy. Any abuses of the policy are treated as a teamwork and human 
resources issue. The first time you screw up you get a warning, the 
second time you get caught I'll send a note to your manager. After that 
any abuses are treated with a loss of cluster access and a referral to 
human resources for further action.

Simply put -- you don't have enough time in the day to deal with users 
who want to game/abuse the system. It's far easier for all concerned to 
have everyone agree on a fair use policy and treat any infractions via 
management rather than cluster settings.

This is another reason why having a cluster governance body helps a lot. 
A committee of cluster power users and IT staff is a great way to get 
consensus on queue setup, cluster policies, disk quotas and the like. 
They can also come down hard with peer pressure on pissy users.

my $.02


Douglas Guptill wrote:
> How does the presence of a job scheduler interact with the ability of a user to
>    ssh to<head>,
>    ssh to<compute-node-n>, and then type
>    mpirun -np 64 my_application
> Intuition tells me there has to be something in a cluster setup, when
> it has a scheduler, that prevents a user from circumventing the
> scheduler by doing something like the above.

More information about the Beowulf mailing list