[Beowulf] Parallella Epiphany performance

Douglas Eadline deadline at eadline.org
Mon Jun 9 12:53:40 PDT 2014


Like Eugen I have a Parallella board (just arrived over the weekend)
I'm not sure when I will have time to play with it (real busy
right now with some other projects) Currently the only
way to connect multiple boards is through Ethernet,
As I have read "Connecting boards with eLink cables is
still work in progress.." (i.e. there are on-board high speed
links to connect multiple boards.)

--
Doug



> Adapteva's CEO, Andreas Oloffson, gave a talk Friday at ORNL, which was
> very well attended. He gave an interesting talk about how to program a
> 16,000 core chip, which was more about the architecture and design choices
> than actually programming a 16K core chip. It is most impressive given
> that
> it was a team of three over a period of three months.
>
> The cores are simple, dual issue RISC with 32 KB of scratch pad and a
> network router. There is no cache or coherency protocol. Every core can
> read/write every other core's memory so that it can appear as a
> distributed, shared memory machine. Non-local accesses are automatically
> converted to network calls and sent out over the NoC. Nearest neighbor
> latency is 4 ns for writes and 16 ns for reads. Farthest neighbor writes
> are 16 ns and 30 ns reads. Routing is east/west then north/south. The
> cores
> form a 2D mesh. He claims that they can build a 1,024 core chip today if
> there is demand for it.
>
> The initial markets are telecom, military, and medical and the
> applications
> best suited for it would need a DSP. For HPC, they claim 102 GF/s at 2
> watts (51 GF/watt), which is exascale class almost (i.e. 1 EF/s at 20 MW
> ignoring cooling, networks, etc). It only has single-precision floating
> point currently. They can add double-precision given enough demand.
> Depending on the memory per core configured, it could provide a
> double-precision peak performance about 30-40% less than the current
> board.
>
> They support C/C++ and OpenCL. Actually, the latter is converted to C++
> and
> C++ is limited given the limited amount of memory. That said, if the bulk
> of your program can fit under 1,500 lines of C, he asserts that it will
> scream.
>
> Lastly, once all the kickstarter boards go out, they hope to have them
> available on Amazon for immediate delivery.
>
> Scott
>
>
>
> On Fri, May 23, 2014 at 9:32 AM, Eugen Leitl <eugen at leitl.org> wrote:
>
>>
>> After I've finally gotten my Kickstart backer board and set it
>> up to boot (you will need the included heatsink on the Zynq 7020
>> as well as a small fan) I've ran a few included benchmarks.
>>
>> In no particular order of relevance:
>>
>> linaro-nano:~/Parallella/epiphany-examples/mesh_bandwidth_all2one>
>> ./run.sh
>> 0x0000417e!
>> The bandwidth of all-to-one is 4193.00MB/s!
>>
>>
>> linaro-nano:~/Parallella/epiphany-examples/mesh_bandwidth_bisection>
>> ./run.sh
>> 0x00000f46!
>> The bandwidth of bisection is 9590.00MB/s!
>>
>> linaro-nano:~/Parallella/epiphany-examples/basic_math> ./run.sh
>>
>> The clock cycle count for addition is 5.
>>
>> The clock cycle count for subtraction is 5.
>>
>> The clock cycle count for multiplication is 6.
>>
>> The clock cycle count for division is 47.
>>
>> The clock cycle count for "fmodf()" is 66635.
>>
>> The clock cycle count for "sinf()" is 23930.
>>
>> The clock cycle count for "cosf()" is 51115.
>>
>> The clock cycle count for "sqrtf()" is 93785.
>>
>> The clock cycle count for "ceilf()" is 18475.
>>
>> The clock cycle count for "floorf()" is 17690.
>>
>> The clock cycle count for "log10f()" is 10735.
>>
>> The clock cycle count for "logf()" is 9976.
>>
>> The clock cycle count for "powf()" is 348243.
>>
>> The clock cycle count for "ldexpf()" is 36306.
>>
>> linaro-nano:~/Parallella/epiphany-examples/matmul-16> ./run.sh
>>
>> Matrix: C[512][512] = A[512][512] * B[512][512]
>>
>> Using 4 x 4 cores
>>
>> Seed = 0.000000
>> Loading program on Epiphany chip...
>> Writing C[1048576B] to address 00200000...
>> Writing A[1048576B] to address 00000000...
>> Writing B[1048576B] to address 00100000...
>> GO Epiphany! ...   Writing the GO!...
>> Done...
>> Finished calculating Epiphany result.
>> Reading result from address 00200000...
>> Calculating result on Host ...   Finished calculating Host result.
>> Reading time from address 00300008...
>>
>> *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
>> Verifying result correctness ...   C_epiphany == C_host
>> *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
>>
>> Epiphany -  time:     153.0 msec  (@ 600 MHz)
>> Host     -  time:    1867.2 msec  (@ 667 MHz)
>>
>> * * *   EPIPHANY FTW !!!   * * *
>>
>> I can run the rest of the examples and post numbers if there's
>> interest:
>>
>> naro-nano:~/Parallella/epiphany-examples> ls -la
>> total 152
>> drwxrwxr-x 36 linaro linaro 4096 May 22 15:46 ./
>> drwxrwxr-x  5 linaro linaro 4096 Mar  7 12:09 ../
>> drwxrwxr-x  8 linaro linaro 4096 Mar  6 23:47 .git/
>> -rw-rw-r--  1 linaro linaro  227 Mar  6 23:42 .gitignore
>> -rw-rw-r--  1 linaro linaro 1464 Mar  6 23:42 README.md
>> drwxrwxr-x  4 linaro linaro 4096 May 17 11:47 assembly/
>> drwxrwxr-x  4 linaro linaro 4096 Mar  6 23:44 basic_math/
>> drwxrwxr-x  4 linaro linaro 4096 Mar  6 23:47 clockgating_mode/
>> drwxrwxr-x  4 linaro linaro 4096 May 17 11:48 ctimer/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_2d/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_chain/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_interrupt/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_message_read/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_message_write/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 dma_slave/
>> drwxrwxr-x  4 linaro linaro 4096 May 22 15:48 e-dump-mem/
>> drwxrwxr-x  4 linaro linaro 4096 May 22 15:46 e-dump-regs/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 e-mem-sync/
>> drwxrwxr-x  4 linaro linaro 4096 Mar  6 23:43 e-toggle-led/
>> drwxrwxr-x  4 linaro linaro 4096 May 22 12:48 emesh_read_latency/
>> drwxrwxr-x  4 linaro linaro 4096 May 22 12:48 emesh_traffic/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 erm/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 erm_example/
>> drwxrwxr-x  4 linaro linaro 4096 Mar  6 23:42 fft2d/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 hardware_barrier/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 hardware_loops/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 hello_parallella/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 interrupts/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 link_lowpower_mode/
>> drwxrwxr-x  4 linaro linaro 4096 Mar  7 02:04 matmul-16/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 mem_protect/
>> drwxrwxr-x  4 linaro linaro 4096 May 23 13:26 mesh_bandwidth_all2one/
>> drwxrwxr-x  4 linaro linaro 4096 May 22 12:42 mesh_bandwidth_bisection/
>> drwxrwxr-x  4 linaro linaro 4096 May 22 12:41 mesh_bandwidth_neighbour/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 mutex/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 nested_interrupts/
>> drwxrwxr-x  3 linaro linaro 4096 Mar  6 23:42 register_test/
>> drwxrwxr-x  4 linaro linaro 4096 May 22 12:07 remote_call/
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
> --
> Mailscanner: Clean
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Doug

-- 
Mailscanner: Clean




More information about the Beowulf mailing list