[Beowulf] Supercomputers face growing resilience problems

Bogdan Costescu bcostescu at gmail.com
Fri Nov 23 06:34:12 PST 2012


On Fri, Nov 23, 2012 at 5:19 AM, Justin YUAN SHI <shi at temple.edu> wrote:
> This is NOT an impossible dream. The packet-switching network is a
> living example of such an architecture.

I don't really agree with the 'packet-switching network' analogy - a
network can be lossy and doesn't care about the intentions of the
upper levels in the stack. The analogy I would use is TCP - a reliable
protocol over an unreliable transport medium.

> The missing piece in HPC
> applications is the principle of statistic multiplexed computing. In
> other words, the application architecture should be considered as a
> whole in the design space, not a "glued" together piece using lower
> layers with unsealed semantic "holes". The semantic "holes" between
> the layers are the real evils for all our troubles.

Wouldn't that lead to a monolithic approach where one
manufacturer/vendor/entity gives you all components with guarantees
that they work together well ? A.K.A. vendor lock-in. And it's not
longer limited to a certain level in the stack but it spawns several
of them - possibly all of them.

> Our research exhibit (booth 3360) demonstrate a prototype data
> parallel system using this idea.

Care to share a link to a website or a published article ?

Cheers,
Bogdan


More information about the Beowulf mailing list