[Beowulf] Torque user quotas

Tiago Marques a28427 at ua.pt
Wed Jul 1 18:47:08 PDT 2009


Hi all,

Beeing somewhat of a noob in Beowulf type clusters, I must 
ask, what do you use to manage user quotas for job 
queueing with Torque and Maiu? Gold Allocation Manager? Or 
does SGE do something like this? I've been browsing the 
web but couldn't find much.

Our current cluster uses just Maui + Torque.
The cluster currently accepts jobs on a FCFS basis but 
this behaviour is far from ideal. I would like to have the 
jobs to continue running as long as the user wishes but 
this usage would be "charged" in his account. The balance 
would be used to decide from whom would the next job 
waiting would be able to run on the nodes when one is made 
available.

Ideally, quotas would be defined by group, which would 
have various users. Each group would be given a specific 
number of nodes where the "sum of 
groups*their_nodes=number_of_nodes".

Say we have 8 nodes and three groups and then node quotas 
would be like this:

  - group1 would have 2 nodes
  - group2 would have 2 nodes
  - group3 would have 4 nodes

So that if the cluster is always full and time usage is 
the same between groups then group1 would be using 2 
nodes, group2 two other nodes, etc.
Now, say that group1 has two times the time usage of 
group2 and group3 is using triple of what group2 used, or 
their exact quota:

  - group 1 would have used 8 days
  - group 2 would have used 4 days
  - group 3 would have used 12 days  (I'm oversimplifying 
and probably will screw up the math somewhere)

Group 3 has used 50% of the time, so the quota is fine, 
group2 is way behind group 1. So, the allocation system 
should disallow group1 from having jobs allocated to nodes 
while they're usage isn't the same as group2 again - 
assuming that group3's usage remains constant and that all 
the nodes are booked:

- group1 would remain at 8 days
- group2 would reach 8 days of usage
- group3 would now be at 16 days of usage

And normal 2-2-4 quotas would be in place again. Or 
ideally this would be smoothed out over time, like in a 
1-3-4 usage, to avoid that anyone would be unable to 
perform calculations for a long time just becaused they 
used the cluster too much when another group didn't use it 
for months.

The problem here is we have different groups with 
different amounts of researchers and some groups have 
allocated more research grants than others to the cluster, 
hence should be entitled to a fair usage scenario. This is 
likely to remain for a good amount of time and automation 
of the quotas would be ideal.

Is there any kind of solution that provides this sort of 
behaviour, even if only for users and not groups?

Best regards,
Tiago Marques



More information about the Beowulf mailing list