[Beowulf] [tt] One million ARM chips challenge Intel bumblebee
Douglas Eadline
deadline at eadline.org
Fri Jul 8 20:33:55 PDT 2011
>> > It's all about ultimate scalability. Anybody with a moderate
>> competence (certainly anyone on this
>> list) could devise a scheme to use 1000 perfect processors that never
>> fail to do 1000 quanta of work
>> in unit time. It's substantially more challenging to devise a scheme to
>> do 1000 quanta of work in
>> unit time on, say, 1500 processors with a 20% failure rate. Or even in
>> 1.2*unit time.
>> >
>>
>> Just to be clear - I wasn't saying this was a bad idea. Scaling up to
>> this size seems inevitable. I was just imagining the team of admins who
>> would have to be working non-stop to replace dead processors!
>>
>> I wonder what the architecture for this system will be like. I imagine
>> it will be built around small multi-socket blades that are hot-swappable
>> to handle this.
>
>
>
> I think that you just anticipate the failures and deal with them. It's
> challenging to write code to do this, but it's certainly a worthy
> objective. I can easily see a situation where the cost to replace dead
> units is so high that you just don't bother doing it: it's cheaper to just
> add more live ones to the "pool".
I wrote about the programming issue in a series of three articles
(conjecture, never really tried it, if only I had the time ...).
The first article links (at the end) to the other two.
http://www.clustermonkey.net//content/view/158/28/
And yes, disposable "nodes" just like a failed cable in a large
cluster, route a new one, don't worry about unbundling
a huge cable tree. I assume there will be a high level of
integration so there may be "nodes" are left for dead which
are integrated into a much larger blade.
--
Doug
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
--
Doug
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf
mailing list