[Beowulf] Scheduler question -- non-uniform memory allocation to MPI
prentice.bisbal at rutgers.edu
Thu Jul 30 11:37:42 PDT 2015
I don't want to be 'that guy', but it sounds like the root-cause of this
problem is the programs themselves. A well-written parallel program
should balance the workload and data pretty evenly across the nodes. Is
this software written by your own researchers, open-source, or a
commercial program? In my opinion, your efforts would be better spent
fixing the program(s), if possible, than finding a scheduler with the
feature you request, which I don't think exists.
If you can't fix the software, I think you're out of luck.
I was going to suggest requesting exclusive use of nodes (whole-node
assignment) the easiest solution. What is the basis for the resistance?
On 07/30/2015 11:34 AM, Tom Harvill wrote:
> We run SLURM with cgroups for memory containment of jobs. When users
> resources on our cluster many times they will specify the number of
> (MPI) tasks and
> memory per task. The reality of much of the software that runs is
> that most of the
> memory is used by MPI rank 0 and much less on slave processes. This is
> and sometimes causes bad outcomes (OOMs and worse) during job runs.
> AFAIK SLURM is not able to allow users to request a different amount
> of memory
> for different processes in their MPI pool. We used to run Maui/Torque
> and I'm fairly
> certain that feature is not present in that scheduler either.
> Does anyone know if any scheduler allows the user to request different
> amounts of
> memory per process? We know we can move to whole-node assignment to
> this problem but there is resistance to that...
> Thank you!
> Tom Harvill
> Holland Computing Center
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf