[Beowulf] New member, upgrading our existing Beowulf cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Chris Samuel csamuel at vpac.orgThu Dec 3 18:32:12 PST 2009
- Previous message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Next message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
----- "Greg Lindahl" <lindahl at pbm.com> wrote: > That kind of policy has a fairly high opportunity > cost, even before you factor in linked nodes. Well we cannot dictate to our users what they do, we set a maximum walltime of 3 months and tell users that they should checkpoint (if they have control of the application and have coding skills). > E.g. you see a system disk going bad, but the user > will lose all their output unless the job runs for > 4 more weeks... We run SMART tests and the like trying to proactively spot bad disks (and other hardware) prior to failures, but yes, that's inevitable. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency
- Previous message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Next message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
