[Beowulf] New member, upgrading our existing Beowulf cluster

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Tue Dec 8 09:56:49 PST 2009

On 12/8/09 9:22 AM, "james bardin" <jbardin at bu.edu> wrote:

> On Tue, Dec 8, 2009 at 10:50 AM, Prentice Bisbal <prentice at ias.edu> wrote:
>> You'd hope that. Most of my current clusters users are scientific
>> researchers in academia, not computer scientists. While some are
>> extremely computer savvy, others have learned just enough about
>> programming to do their calculations. Expecting the latter to write code
>> with checkpointing is unrealistic, and working in academia, I can't
>> force them to. Which is why taking down 4 nodes instead of just one is
>> less than ideal.
> I find it's still advantageous to push them to learn it. A researcher
> working with a tight deadline for a grant will often see the light
> when a hardware failure loses them a month or more of data processing.
> It really is in their own best interests to learn about their tools.

What about some form of "image checkpoint" like "hibernation"... Should be
application unaware, just snapshots memory.

More information about the Beowulf mailing list