[Beowulf] Memory limit enforcement
Tim Cutts
tjrc at sanger.ac.uk
Wed Oct 10 00:23:14 PDT 2007
On 10 Oct 2007, at 5:47 am, Mike Davis wrote:
> We have been dealing with similar problems on one of our clusters.
> The solution that we're coming to is that we need a non-standard
> solution. With Sun Grid Engine, one could build a memory consumable
> and then have jobs request memory. One could even require jobs to
> request memory. The problem is that many times a user will not know
> how much memory to request.
If the memory requirements of the application are not known, then all
bets are off, and there's basically nothing you can do to stop either
the application being killed by an arbitrarily low memory limit that
you set, or at the other extreme running out of memory.
We do exactly what you suggest, but under LSF, which has resource
reservation for memory out of the box. Of course, it's not real
reservation, but it's reservation as far as the scheduler is
concerned. We then have a default memory limit on the queues which
is really very low indeed (1.9 GB, typically, because we have 2 GB
RAM per core on our nodes). If the user wants more memory, they have
to set a new higher limit themselves. When they do that, we have
supplied LSF with an esub script which then checks that the user has
supplied both the new memory, and a suitable resource selection and
reservation option. If they have not, the job is rejected. So for
example, if the user asks for a 6 GB memory limit, the esub will
check that they have requested a machine with at least 6GB of free
memory, and then reserve that memory with the scheduler. For example:
-M6000000 -R"select[mem>6000] rusage[mem=6000]"
On our beowulf cluster, this has been fairly effective in reducing
the frequency with which nodes run out of memory - they jobs are
usually killed first. It's not 100% effective though.
> We have been experimenting with using SGE 6's suspend feature with
> a Free RAM limit to stop (suspend) jobs that are going over the
> preset limit. The problem with this particular solution is that the
> reporting feature has a default timing of once every 40 seconds.
> This means that there will be some lag and that could cause
> problems with jobs that allocate RAM very quickly.
This is a problem with the LSF solution too. I don't think there's a
great deal that can be done about it, as others have said. The other
problem is that simply stopping the jobs then results in a node with
suspended processes on it that are often deadlocked; you can't resume
the job without running out of memory. So you might as well have
simply killed the job in the first place.
>
> I still believe that the best solution is to make users aware of
> the memory requirements for their jobs and then have them use
> memory requests and common sense to get their work done.
Absolutely. If the user doesn't understand their application, all
bets are off.
Tim
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Beowulf
mailing list