[Beowulf] Jeff Squayres MPI proposals
samuel at unimelb.edu.au
Thu Mar 3 15:30:23 PST 2016
On 04/03/16 06:40, Douglas Eadline wrote:
> Yes, failure needs to be option.
The Slurm folks have been working on failure management support for a
little while, the idea being you can have a pool of spare nodes to pick
from (or alternatively bargain with a scheduler for a node that's
currently busy to come free later on and then add it to the job,
potentially extending the walltime to make up for the shortfall).
A better description from someone with higher caffeination is here:
All the best,
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
More information about the Beowulf