[Beowulf] Re: Interesting google server design

Greg Lindahl lindahl at pbm.com
Sat Apr 4 15:10:57 PDT 2009


On Sat, Apr 04, 2009 at 05:24:23PM -0400, Jason Riedy wrote:
> And Robert G. Brown writes:
> > For them servicing/replacing a system is cheap: Box dies.
> > Employee notes this, grabs box from Big Stack of Boxes, carries
> > it to dead box, removes dead box, replace it with new working
> > box, presses power switch, walks away.
> 
> Plus, your operator can be unskilled.

Um, not completely. These clusters work by starting with 3 copies of
every chunk of the data, and as you work you have to make sure that
you don't take down the wrong system and leave the cluster with 0 or 1
copies of a chunk of data. There are software mechanisms you can use
to help, but the operator needs to know how the rules work.

Some tasks, yeah, no problem: if the box is already dead. But many
tasks involve boxes which aren't dead yet: 1 disk has failed, the box
needs a reboot to run a new kernel, a new release of the application
software, etc etc.

-- greg





More information about the Beowulf mailing list