[Beowulf] New member, upgrading our existing Beowulf cluster
    Lux, Jim (337C) 
    james.p.lux at jpl.nasa.gov
       
    Tue Dec  8 09:56:49 PST 2009
    
    
  
On 12/8/09 9:22 AM, "james bardin" <jbardin at bu.edu> wrote:
> On Tue, Dec 8, 2009 at 10:50 AM, Prentice Bisbal <prentice at ias.edu> wrote:
> 
>> You'd hope that. Most of my current clusters users are scientific
>> researchers in academia, not computer scientists. While some are
>> extremely computer savvy, others have learned just enough about
>> programming to do their calculations. Expecting the latter to write code
>> with checkpointing is unrealistic, and working in academia, I can't
>> force them to. Which is why taking down 4 nodes instead of just one is
>> less than ideal.
>> 
> 
> I find it's still advantageous to push them to learn it. A researcher
> working with a tight deadline for a grant will often see the light
> when a hardware failure loses them a month or more of data processing.
> It really is in their own best interests to learn about their tools.
What about some form of "image checkpoint" like "hibernation"... Should be
application unaware, just snapshots memory.
    
    
More information about the Beowulf
mailing list