[Beowulf] Jeff Squayres MPI proposals
Justin Y. Shi
shi at temple.edu
Mon Mar 7 06:24:52 PST 2016
It's also interesting to observe the API designs follow XML (key, value)
pair in all these startups that want to handle failures ...
On Mon, Mar 7, 2016 at 8:32 AM, John Hearns <hearnsj at googlemail.com> wrote:
> Indeed. Some interesting news here:
> Us old style guys are going to have our lunch money stolen by young
> upstarts. Or is that startups?
> Seriously - these guys know how to keep things running at scale and how to
> tolerate failures.
> On 3 March 2016 at 23:30, Christopher Samuel <samuel at unimelb.edu.au>
>> On 04/03/16 06:40, Douglas Eadline wrote:
>> > Yes, failure needs to be option.
>> The Slurm folks have been working on failure management support for a
>> little while, the idea being you can have a pool of spare nodes to pick
>> from (or alternatively bargain with a scheduler for a node that's
>> currently busy to come free later on and then add it to the job,
>> potentially extending the walltime to make up for the shortfall).
>> A better description from someone with higher caffeination is here:
>> All the best,
>> Christopher Samuel Senior Systems Administrator
>> VLSCI - Victorian Life Sciences Computation Initiative
>> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>> http://www.vlsci.org.au/ http://twitter.com/vlsci
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf