[Beowulf] non-stop computing
Justin Y. Shi
shi at temple.edu
Wed Oct 26 06:12:10 PDT 2016
John's post is really funny! But I would only endorse Gavin's
recommendation for it solves the problem statistically (and correctly).
On Wed, Oct 26, 2016 at 12:07 AM, Christopher Samuel <samuel at unimelb.edu.au>
> On 26/10/16 14:45, John Hanks wrote:
> > I'd suggest making NFS mounts hard, so processes can recover from an NFS
> > server reboot.
> ...plus set the NFS fsid for each export server side so they come back
> reproducibly each time...
> PS: I endorse what John said (now I've finished laughing), I'd suggest
> making sure you've at least got ECC memory though and RAID as those are
> the two parts that can go bad. When we had clusters with disks in
> compute nodes those were the most frequent failures, now we run diskless
> nodes it's memory DIMMs. :-)
> All the best,
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf