[Beowulf] Chinese 16 core CPU uses message passing

Mark Hahn hahn at mcmaster.ca
Wed Feb 22 14:08:22 PST 2012

> (experimental chip. not Godson)
> http://semiaccurate.com/2012/02/21/chinese-16-core-cpu-uses-message-passing/

unfortately little real info there.

> Gone are the old days of massive shared memory architectures.

weird.  has this writer looked at chip diagrams for the past few years?
does 16 cores per memory interface (AMD) count as massive?  yes, most 
systems are CC-NUMA, but the number of "massive" CC-NUMA systems can
really be spelled with three letters: SGI.  and basically boutique.

> During the first day of ISSCC in San Francisco research from Fudan University
> in Shanghai described a brand new microprocessor

"brand new microprocessor" is a sort of funny phrase.  lots of things have 
been tried before, including, afaikt, everything in this chip.
this paper seems similar to http://dx.doi.org/10.1109/ICSICT.2010.5667778
(some of the same authors) which also involves a network-on-chip 
and "extended register file".  it's based on MIPS32, which is a pretty
popular choice for arch experiments.

> that does away with the
> traditional shared memory architecture.

not really.

>    Photo courtesy of Fudan
> The advantage of using a message passing scheme is that it scales much better
> than the shared memory.

apples scale better than oranges, too.
the duality of MP and SM is not a new concept - not that we have such a great
handle on it.

> Whereas shared memory relies on software,

yikes.  oversimplify much?

> the message
> passing scheme has been implemented using mailboxes designed in hardware,
> according to the research paper that was presented at ISSC.  The processor
> itself consists of 16 RISC cores that share two small cores for shared memory
> access,

I'd prefer to see it described as 8 compute cores surrounding a memory core,
with all cores on an in-chip network, but (presumably) no coherency between
the two memory cores.  the diagram makes the chip look to be focused on
stream processing (the related paper uses reed-solomon decoding as its test load).

  but much of the communication is done using message passing. The
> processor also does away with the traditional caches and instead implements
> an extended register file.

well, I think I'd call the MCore a cache; if you do, the diagram looks much
more conventional...

I love experimental chips and arch; I wish this paper were available already.
but the field is very well-plowed - that doesn't detract from its fertility.

what I _don't_ see in my sampling of current papers is any attempt to create
a new or improved programming model that can nicely scale, both in terms of 
architecture and productivity, to systems of many cores.  I'm also pretty
convinced that one needs to start with a model that doesn't start with
separate boxes labeled "cpu" and "memory".

More information about the Beowulf mailing list