[Beowulf] Pony: not yours.

Thu May 16 07:28:12 PDT 2013

>> https://docs.google.com/file/d/0B83UyWf1s-CdZnFoS2RiU2lJbEU/edit?usp=drive_web
> Interesting, but depressing, presentation.

I found it unilluminating, actually.  don't we all know about power issues?

to me it raised two interesting questions:

- what software and hardware architecture would better optimize communication
to address the flop/bps per-joule divergence?  here's a bluesky/strawman:
put all the migration/coherence stuff into hardware.  imagine if all cpus 
were globally cache-coherent, and the objects being cached were not just 
cache lines, and could having their sharing/migration semantics defined by
the program.  you might be saying "oh, but caches are not energy-efficient
compared to static resources like registers".  well, let's JIT the local 
code to refer to the static local address of an item when cached, rather 
than relying on a CAM - after all, if hardware is managing the cache, it 
can rename all references in the instruction stream...

- do we need exascale anyway?  would the world be better off with a thousand
petascale machines?  I know the field tends to view this as a kind of
manifest destiny, but why?  the secondary argument usually devolves to
something like "well, the high end pushes the envelope so the masses can 
benefit from trickle-down improvements."  but if this is the main
justification, it's a bit twisted: we need to make clusters so big that we
can't afford to power them in order to force ourselves to develop more
power-efficient systems?  if power is an important TCO component, why aren't
we optimizing for it already (in any sized facility)?

> At HotInterconnects in 2011, Intel gave a presentation about the reductions
> in power per flop over time. The corresponding communication power
> consumption per bit was flat (no improvement). Projecting the trends

well, it has to be said that most computation is mundane and doesn't require 
multiple processors, let alone big clusters.  so the fact that the computer 
industry focuses on this domain is appropriate.

> forward and with a 20 Mw power budget, an exascale machine's network would
> consume all the power leaving nothing for computation.

well, that sounds absurd - were they assuming a full-bisection fat tree of 
older (hotter, lower fanout) generation interconnect?