[Beowulf] first cluster
dag at sonsorol.org
Fri Jul 16 10:01:01 PDT 2010
You want the honest answer?
There are technical things you can do to to prevent users from bypassing
the scheduler and resource allocation policies. One of the cooler things
I've seen in Grid Engine environments was a cron job that did a "kill
-9" against any user process that was not a child of a sge_shepherd
daemon. Very effective.
Other people play games with pam settings and the like.
The honest truth is that technical countermeasures are mostly a waste of
time. A motivated user always has more time and effort to spend trying
to game the system than an overworked administrator.
My recommendation is to subject users to a cluster acceptable use
policy. Any abuses of the policy are treated as a teamwork and human
resources issue. The first time you screw up you get a warning, the
second time you get caught I'll send a note to your manager. After that
any abuses are treated with a loss of cluster access and a referral to
human resources for further action.
Simply put -- you don't have enough time in the day to deal with users
who want to game/abuse the system. It's far easier for all concerned to
have everyone agree on a fair use policy and treat any infractions via
management rather than cluster settings.
This is another reason why having a cluster governance body helps a lot.
A committee of cluster power users and IT staff is a great way to get
consensus on queue setup, cluster policies, disk quotas and the like.
They can also come down hard with peer pressure on pissy users.
Douglas Guptill wrote:
> How does the presence of a job scheduler interact with the ability of a user to
> ssh to<head>,
> ssh to<compute-node-n>, and then type
> mpirun -np 64 my_application
> Intuition tells me there has to be something in a cluster setup, when
> it has a scheduler, that prevents a user from circumventing the
> scheduler by doing something like the above.
More information about the Beowulf