[Beowulf] New member, upgrading our existing Beowulf cluster
Greg Lindahl
lindahl at pbm.com
Thu Dec 3 18:41:29 PST 2009
> > E.g. you see a system disk going bad, but the user
> > will lose all their output unless the job runs for
> > 4 more weeks...
>
> We run SMART tests and the like trying to proactively
> spot bad disks (and other hardware) prior to failures,
> but yes, that's inevitable.
It's not inevitable that the policy be that 3 month jobs are allowed.
But you know me: I never saw a battle I didn't want to fight :-) Arrr,
mateys, this be the BOFH, and I'm heere to educate you about the right
way to use this here supercomputer... my way... or walk the plank!
-- greg
More information about the Beowulf
mailing list