<div dir="ltr">Adapteva's CEO, Andreas Oloffson, gave a talk Friday at ORNL, which was very well attended. He gave an interesting talk about how to program a 16,000 core chip, which was more about the architecture and design choices than actually programming a 16K core chip. It is most impressive given that it was a team of three over a period of three months.<div>
<br></div><div>The cores are simple, dual issue RISC with 32 KB of scratch pad and a network router. There is no cache or coherency protocol. Every core can read/write every other core's memory so that it can appear as a distributed, shared memory machine. Non-local accesses are automatically converted to network calls and sent out over the NoC. Nearest neighbor latency is 4 ns for writes and 16 ns for reads. Farthest neighbor writes are 16 ns and 30 ns reads. Routing is east/west then north/south. The cores form a 2D mesh. He claims that they can build a 1,024 core chip today if there is demand for it.</div>
<div><br></div><div>The initial markets are telecom, military, and medical and the applications best suited for it would need a DSP. For HPC, they claim 102 GF/s at 2 watts (51 GF/watt), which is exascale class almost (i.e. 1 EF/s at 20 MW ignoring cooling, networks, etc). It only has single-precision floating point currently. They can add double-precision given enough demand. Depending on the memory per core configured, it could provide a double-precision peak performance about 30-40% less than the current board.</div>
<div><br></div><div>They support C/C++ and OpenCL. Actually, the latter is converted to C++ and C++ is limited given the limited amount of memory. That said, if the bulk of your program can fit under 1,500 lines of C, he asserts that it will scream.</div>
<div><br></div><div>Lastly, once all the kickstarter boards go out, they hope to have them available on Amazon for immediate delivery.</div><div><br></div><div>Scott</div><div><br></div></div><div class="gmail_extra"><br>
<br><div class="gmail_quote">On Fri, May 23, 2014 at 9:32 AM, Eugen Leitl <span dir="ltr"><<a href="mailto:eugen@leitl.org" target="_blank">eugen@leitl.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
After I've finally gotten my Kickstart backer board and set it<br>
up to boot (you will need the included heatsink on the Zynq 7020<br>
as well as a small fan) I've ran a few included benchmarks.<br>
<br>
In no particular order of relevance:<br>
<br>
linaro-nano:~/Parallella/epiphany-examples/mesh_bandwidth_all2one> ./run.sh<br>
0x0000417e!<br>
The bandwidth of all-to-one is 4193.00MB/s!<br>
<br>
<br>
linaro-nano:~/Parallella/epiphany-examples/mesh_bandwidth_bisection> ./run.sh<br>
0x00000f46!<br>
The bandwidth of bisection is 9590.00MB/s!<br>
<br>
linaro-nano:~/Parallella/epiphany-examples/basic_math> ./run.sh<br>
<br>
The clock cycle count for addition is 5.<br>
<br>
The clock cycle count for subtraction is 5.<br>
<br>
The clock cycle count for multiplication is 6.<br>
<br>
The clock cycle count for division is 47.<br>
<br>
The clock cycle count for "fmodf()" is 66635.<br>
<br>
The clock cycle count for "sinf()" is 23930.<br>
<br>
The clock cycle count for "cosf()" is 51115.<br>
<br>
The clock cycle count for "sqrtf()" is 93785.<br>
<br>
The clock cycle count for "ceilf()" is 18475.<br>
<br>
The clock cycle count for "floorf()" is 17690.<br>
<br>
The clock cycle count for "log10f()" is 10735.<br>
<br>
The clock cycle count for "logf()" is 9976.<br>
<br>
The clock cycle count for "powf()" is 348243.<br>
<br>
The clock cycle count for "ldexpf()" is 36306.<br>
<br>
linaro-nano:~/Parallella/epiphany-examples/matmul-16> ./run.sh<br>
<br>
Matrix: C[512][512] = A[512][512] * B[512][512]<br>
<br>
Using 4 x 4 cores<br>
<br>
Seed = 0.000000<br>
Loading program on Epiphany chip...<br>
Writing C[1048576B] to address 00200000...<br>
Writing A[1048576B] to address 00000000...<br>
Writing B[1048576B] to address 00100000...<br>
GO Epiphany! ... Writing the GO!...<br>
Done...<br>
Finished calculating Epiphany result.<br>
Reading result from address 00200000...<br>
Calculating result on Host ... Finished calculating Host result.<br>
Reading time from address 00300008...<br>
<br>
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***<br>
Verifying result correctness ... C_epiphany == C_host<br>
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***<br>
<br>
Epiphany - time: 153.0 msec (@ 600 MHz)<br>
Host - time: 1867.2 msec (@ 667 MHz)<br>
<br>
* * * EPIPHANY FTW !!! * * *<br>
<br>
I can run the rest of the examples and post numbers if there's<br>
interest:<br>
<br>
naro-nano:~/Parallella/epiphany-examples> ls -la<br>
total 152<br>
drwxrwxr-x 36 linaro linaro 4096 May 22 15:46 ./<br>
drwxrwxr-x 5 linaro linaro 4096 Mar 7 12:09 ../<br>
drwxrwxr-x 8 linaro linaro 4096 Mar 6 23:47 .git/<br>
-rw-rw-r-- 1 linaro linaro 227 Mar 6 23:42 .gitignore<br>
-rw-rw-r-- 1 linaro linaro 1464 Mar 6 23:42 README.md<br>
drwxrwxr-x 4 linaro linaro 4096 May 17 11:47 assembly/<br>
drwxrwxr-x 4 linaro linaro 4096 Mar 6 23:44 basic_math/<br>
drwxrwxr-x 4 linaro linaro 4096 Mar 6 23:47 clockgating_mode/<br>
drwxrwxr-x 4 linaro linaro 4096 May 17 11:48 ctimer/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_2d/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_chain/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_interrupt/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_message_read/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_message_write/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 dma_slave/<br>
drwxrwxr-x 4 linaro linaro 4096 May 22 15:48 e-dump-mem/<br>
drwxrwxr-x 4 linaro linaro 4096 May 22 15:46 e-dump-regs/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 e-mem-sync/<br>
drwxrwxr-x 4 linaro linaro 4096 Mar 6 23:43 e-toggle-led/<br>
drwxrwxr-x 4 linaro linaro 4096 May 22 12:48 emesh_read_latency/<br>
drwxrwxr-x 4 linaro linaro 4096 May 22 12:48 emesh_traffic/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 erm/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 erm_example/<br>
drwxrwxr-x 4 linaro linaro 4096 Mar 6 23:42 fft2d/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 hardware_barrier/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 hardware_loops/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 hello_parallella/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 interrupts/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 link_lowpower_mode/<br>
drwxrwxr-x 4 linaro linaro 4096 Mar 7 02:04 matmul-16/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 mem_protect/<br>
drwxrwxr-x 4 linaro linaro 4096 May 23 13:26 mesh_bandwidth_all2one/<br>
drwxrwxr-x 4 linaro linaro 4096 May 22 12:42 mesh_bandwidth_bisection/<br>
drwxrwxr-x 4 linaro linaro 4096 May 22 12:41 mesh_bandwidth_neighbour/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 mutex/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 nested_interrupts/<br>
drwxrwxr-x 3 linaro linaro 4096 Mar 6 23:42 register_test/<br>
drwxrwxr-x 4 linaro linaro 4096 May 22 12:07 remote_call/<br>
<br>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
</blockquote></div><br></div>