[Beowulf] non-stop computing

Justin Y. Shi shi at temple.edu
Wed Oct 26 06:12:10 PDT 2016


John's post is really funny! But I would only endorse Gavin's
recommendation for it solves the problem statistically (and correctly).

Justin

On Wed, Oct 26, 2016 at 12:07 AM, Christopher Samuel <samuel at unimelb.edu.au>
wrote:

> On 26/10/16 14:45, John Hanks wrote:
>
> > I'd suggest making NFS mounts hard, so processes can recover from an NFS
> > server reboot.
>
> ...plus set the NFS fsid for each export server side so they come back
> reproducibly each time...
>
> PS: I endorse what John said (now I've finished laughing), I'd suggest
> making sure you've at least got ECC memory though and RAID as those are
> the two parts that can go bad.  When we had clusters with disks in
> compute nodes those were the most frequent failures, now we run diskless
> nodes it's memory DIMMs. :-)
>
> All the best,
> Chris
> --
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20161026/e34d10c7/attachment.html>


More information about the Beowulf mailing list