[Beowulf] Intel: 1,000-core processor possible

Eugen Leitl eugen at leitl.org
Mon Nov 22 07:50:07 PST 2010


Intel: 1,000-core processor possible

A group of Intel researchers has pioneered a messaging system that would
allow multiple cores to communicate

Joab Jackson (IDG News Service)20 November, 2010 09:04

An experimental Intel chip shows the feasibility of building processors with
1,000 cores, an Intel researcher has asserted.

The architecture for the Intel 48-core Single Chip Cloud Computer[1] (SCC)
processor is "arbitrarily scalable," said Intel researcher Timothy Mattson,
during a talk at the Supercomputer 2010 conference being held this week in
New Orleans.

"This is an architecture that could, in principle, scale to 1,000 cores," he
said. " I can just keep adding, adding, adding cores."

Only after 1,000 cores or so, the diameter of the mesh, or the on-chip
network connecting the many cores, will grow to such an extent that it would
negatively impact performance, Mattson said.

Intel remains adamant that the future progress of microprocessors will depend
on packing ever more cores onto a chip. As more cores are added, however,
Intel designers must confront the problem of scalability.

Initial multicore chip architectures depended on a set of protocols that
assures that each core has the same view of the system's memory, a technique
called cache coherency.

As more cores are added to chips, this approach becomes problematic insofar
that "the protocol overhead per core grows with the number of cores, leading
to a 'coherency wall' beyond which the overhead exceeds the value of adding
cores," the paper accompanying Mattson's talk noted.

Mattson has argued[2] that a better approach would be to eliminate cache
coherency and instead allow cores to pass messages among one another.

The recent work of the design team has centered on developing message-passing
techniques for the chip that would scale as more cores are added.

Designed by Intel's TeraScale Research Program over the past several years,
the chip itself is an experimental one and is not on the Intel product road
map, Mattson said. A limited number of copies have been distributed to
researchers and developers so they can build development tools for the

The chip, first fabricated with a 45-nanometer process at Intel facilities
about a year ago, is actually a six-by-four array of tiles, each tile
containing two cores. It has more than 1.3 billion transistors and consumes
from 25 to 125 watts.

For simplicity's sake, the team used an off-the-shelf 1994-era Pentium
processor design for the cores themselves. "Performance on this chip is not
interesting," Mattson said. It uses a standard x86 instruction set.

The novelty of this processor is in its tiled architecture and the network
and address infrastructure. Each core has a "mesh interface component" that
packages data into packets and connects to an on-board router. Each tile also
has a "message-passing buffer," with 16 kilobytes of random access memory.

The team has tried various approaches to streamline the ability of the
processor to pass messages among the many cores.

By installing the TCP/IP protocol on the data link layer, the team was able
to run a separate Linux-based operating system on each core. Mattson noted
that while it would be possible to run a 48-node Linux cluster on the chip,
it "would be boring."

"To make this interesting, I would have to ask, how would the programming
models map onto the unique features of this chip," he said.

The team also developed a small API (application programming interface)
library for message passing among the cores, called RCCE, and which Mattson
pronounced as "Rocky."

In tests, the team showed that message passing among the cores could be just
as speedy using RCCE as with TCP/IP-based Linux cluster. And both approaches
bode well for the message-passing approach for inter-core communication.

"Our preliminary work has demonstrated that the SCC processor and its native
message passing API provide an effective software development platform," the
paper concludes. "The expected difficulties due to the lack of asynchronous
message passing have so far not materialized."

In addition to talking about the chip's message-passing capabilities, Mattson
also elaborated on SCC's power-saving capabilities[3]. The frequency of each
tile can be varied. Hooks are provided for programmers that would allow their
programs to adjust the frequency speed and even the voltage of the cores they
are running upon. This feature will, however, create a new challenge for
programmers, he warned.

"It's a lot harder than you'd think to look at your program and think 'how
many volts do I really need?'" he said.

Joab Jackson covers enterprise software and general technology breaking news
for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson[4]. Joab's
e-mail address is Joab_Jackson at idg.com[5]






mailto:Joab_Jackson at idg.com

More information about the Beowulf mailing list