[Beowulf] New member, upgrading our existing Beowulf cluster

Greg Lindahl lindahl at pbm.com
Thu Dec 3 18:41:29 PST 2009

> > E.g. you see a system disk going bad, but the user
> > will lose all their output unless the job runs for
> > 4 more weeks...
> We run SMART tests and the like trying to proactively
> spot bad disks (and other hardware) prior to failures,
> but yes, that's inevitable.

It's not inevitable that the policy be that 3 month jobs are allowed.

But you know me: I never saw a battle I didn't want to fight :-) Arrr,
mateys, this be the BOFH, and I'm heere to educate you about the right
way to use this here supercomputer... my way... or walk the plank!

-- greg

