[Beowulf] [tt] One million ARM chips challenge Intel bumblebee

Mark Hahn hahn at mcmaster.ca
Sat Jul 16 13:19:43 PDT 2011

>>> How long is it going to take to wire them all up? And how fast are they
>>> going to fail? If there's a MTBF of one million hours, that's still one
>>> failure per hour.
> They do address som of that in ftp://ftp.cs.man.ac.uk/pub/amulet/papers/SBF_ACSD09.pdf

the 1m proc seems to be referring to cores, of which their current SOC
has 20/chip, and there are 4 chips on their current test board:

hmm, that article says 18 cores (maybe reduced for yield).  stacked dram, not
sure what the other companion chip is on the test board.

anyway, compare it to the K computer: 516096 compute cores, 64512 packages,
versus 50k packages for Spinnaker.  Spinnaker will obviously put more 
chips onto a single board (board links are more reliable than connectors,
as well as more power-efficient.)  Spinnaker has 6 links for a 2d toroidal
mesh (not 3d for some reason) - K also uses a 6-link mesh.

obviously, off-board links need a connector, but if I were designing either 
box I'd have each board plug into a per-rack backplane, again, to avoid
dealing with cables.  if you have a per-rack sub-mesh anyway, it should 
be 3d, shouldn't it?

in abstract, it seems like Spinnaker would want a 3d mesh to better model 
the failure effect in the brain (which is certainly not 2d nearest-neighbor!)
in fact, if you wanted to embrace brain-like topologies, I'd think a 
flat-network-neighborhood would be most realistic (albeit cable-intensive.
but we're not afraid of failed cables, since the brain isn't!)

More information about the Beowulf mailing list