[Beowulf] Torque user quotas
Tiago Marques
a28427 at ua.pt
Wed Jul 1 18:47:08 PDT 2009
Hi all,
Beeing somewhat of a noob in Beowulf type clusters, I must
ask, what do you use to manage user quotas for job
queueing with Torque and Maiu? Gold Allocation Manager? Or
does SGE do something like this? I've been browsing the
web but couldn't find much.
Our current cluster uses just Maui + Torque.
The cluster currently accepts jobs on a FCFS basis but
this behaviour is far from ideal. I would like to have the
jobs to continue running as long as the user wishes but
this usage would be "charged" in his account. The balance
would be used to decide from whom would the next job
waiting would be able to run on the nodes when one is made
available.
Ideally, quotas would be defined by group, which would
have various users. Each group would be given a specific
number of nodes where the "sum of
groups*their_nodes=number_of_nodes".
Say we have 8 nodes and three groups and then node quotas
would be like this:
- group1 would have 2 nodes
- group2 would have 2 nodes
- group3 would have 4 nodes
So that if the cluster is always full and time usage is
the same between groups then group1 would be using 2
nodes, group2 two other nodes, etc.
Now, say that group1 has two times the time usage of
group2 and group3 is using triple of what group2 used, or
their exact quota:
- group 1 would have used 8 days
- group 2 would have used 4 days
- group 3 would have used 12 days (I'm oversimplifying
and probably will screw up the math somewhere)
Group 3 has used 50% of the time, so the quota is fine,
group2 is way behind group 1. So, the allocation system
should disallow group1 from having jobs allocated to nodes
while they're usage isn't the same as group2 again -
assuming that group3's usage remains constant and that all
the nodes are booked:
- group1 would remain at 8 days
- group2 would reach 8 days of usage
- group3 would now be at 16 days of usage
And normal 2-2-4 quotas would be in place again. Or
ideally this would be smoothed out over time, like in a
1-3-4 usage, to avoid that anyone would be unable to
perform calculations for a long time just becaused they
used the cluster too much when another group didn't use it
for months.
The problem here is we have different groups with
different amounts of researchers and some groups have
allocated more research grants than others to the cluster,
hence should be entitled to a fair usage scenario. This is
likely to remain for a good amount of time and automation
of the quotas would be ideal.
Is there any kind of solution that provides this sort of
behaviour, even if only for users and not groups?
Best regards,
Tiago Marques
More information about the Beowulf
mailing list