[Beowulf] [tt] One million ARM chips challenge Intel bumblebee

Nathan Moore ntmoore at gmail.com
Thu Jul 7 15:22:47 PDT 2011


Some of these "decay over time" questions have been worked on in the context
of detector design in high energy physics.  Everything in the big detectors
needs to be radiation hard...

On Thu, Jul 7, 2011 at 2:38 PM, Prentice Bisbal <prentice at ias.edu> wrote:

> On 07/07/2011 02:26 PM, Lux, Jim (337C) wrote:
> >>> It's all about ultimate scalability.  Anybody with a moderate
> competence (certainly anyone on this
> >> list) could devise a scheme to use 1000 perfect processors that never
> fail to do 1000 quanta of work
> >> in unit time.  It's substantially more challenging to devise a scheme to
> do 1000 quanta of work in
> >> unit time on, say, 1500 processors with a 20% failure rate.  Or even in
> 1.2*unit time.
> >>>
> >>
> >> Just to be clear - I wasn't saying this was a bad idea. Scaling up to
> >> this size seems inevitable. I was just imagining the team of admins who
> >> would have to be working non-stop to replace dead processors!
> >>
> >> I wonder what the architecture for this system will be like. I imagine
> >> it will be built around small multi-socket blades that are hot-swappable
> >> to handle this.
> >
> >
> >
> > I think that you just anticipate the failures and deal with them.  It's
> challenging to write code to do this, but it's certainly a worthy objective.
> I can easily see a situation where the cost to replace dead units is so high
> that you just don't bother doing it: it's cheaper to just add more live ones
> to the "pool".
> >
>
> Did you read the paper that someone else posted a link to? I just read
> the first half of it. A good part of this research is focused on
> fault-tolerance/resiliency of computer systems. They're not just
> interested in creating a computer to mimic the brain, they want to learn
> how to mimic the brain's fault-tolerance in computers.
>
> To paraphrase the paper, we lose a neuron a second in our brains for our
> entire lives, but we never notice any problems from that. This research
> hopes to learn how to duplicate with that this computer, so you could
> say hardware failures are desirable and necessary for this research.
>
> Prentice
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>



-- 
- - - - - - -   - - - - - - -   - - - - - - -
Nathan Moore
Associate Professor, Physics
Winona State University
- - - - - - -   - - - - - - -   - - - - - - -
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20110707/91ead86a/attachment.html>


More information about the Beowulf mailing list