[Beowulf] ARM cpu's and development boards and research

Vincent Diepeveen diep at xs4all.nl
Tue Nov 27 07:46:37 PST 2012

i dug around in price of ARMs and development boards.

If you just buy a handful most interesting offer seems to be


it's $129 and has a quad core ARM cortex A9 at 1.4Ghz and a mali gpu  
on it.

So from raw cpu cores this is by far best offer on the market,  
nothing even gets close to it,
which is pretty amazing as one would suspect a cheaper offer exists -  
yet if you want to
run Linux and toy yourself, this is the only board that is quadcore  
and affordable to some

I find $129 still pretty expensive by the way for a board that gets  
produced for a few dollars and
a CPU you can buy in for $40 a piece in low quantities, but it's  
acceptable if you just buy a handful.

The number of quadcore offers is pretty limited in ARM world - it's  
nearly all dual cores there.

This CPU should be eating at 1.0Ghz exactly 3 watt under full load  
and a tad more at 1.4Ghz,
though the "1.4Ghz" clock i couldn't confirm in an easy manner. Could  
be the quadcore version is 1.0
Ghz after all.

So power usage of entire board i'd guess is around 10 watt from the  
power tap as the tiny adapters
when i measure here lose a lot (could be the way how i buy in of  

Note that this is a 32 bits processor which has single precision gpu  

The cortex A15 that goes in production in 2013 maybe, that one is 64  
bits and has double precision.

For robotica when i checked it out a 32 bits cpu is more than enough  
for the artificial intelligence for now,
yet more cores is more fun :)

A big hurdle always is the many dual core ARMS out there which do not  
have fast RAM yet which only have
some sort of flash type memory - which is pretty slow for software  
that needs to hit the RAM regurarly.

Dual core ARMs from my viewpoint are not interesting - yet  
considering the cheap prices they sell for i guess
that's what we will keep seing produced.

 From HPC viewpoint of course more interesting is a small arm board  
with 64 quadcore ARM cpu's on it
totalling 256 cores and get that for a cheap price,
as from the power usage of the cpu itself it should be possible to  
build this :)

If one would produce several of those - i don't know board production  
costs - it would be easy to buy in the ARM
cpu's at a pretty much lower price - which is the real problem of all  
these projects - getting the cpu's cheap.

I assume the problem then is to build a way of low latency  
communication between the arm cpu's.

Maybe someone wants to shine a light on that. I would really want to  
be able to communicate under 20 microsecnds
for blocked reads of say a byte or 128 crisscross from cpu to cpu (so  
all cpu's doing this to a random other cpu nonstop).

So for cpu's the bandwidth is less relevant than the latency.

For GPU's the bandwidth is everything from node to node.

Some years ago (nearly 10 years or so?)  i was approached by email by  
TU Delft which built an
ARM board with 512 cpu's, yet they had bit
clumsy network between them which wasn't low latency. If i remember  
well it was 10 mbit ethernet, so that's nearly
in the milliseconds latency practical.

I couldn't get more details than that as they didn't answer any other  
email - seems they thought to just get the source code
of the chessprogram, compile it, run it at the board and then do  
something else.

Of course that's not how it can work - it takes years to get  
something like that working well at such hardware
which means i would need such board in my own office or nearly daily  
access to it to experiment and run tests,
especially latency and load tests to see how the hardware responds :)

With the Cortex A15 being 64 bits and having double precision  
capabilities this sort of project
gets once interesting again for a few
specific applications that really need a tad of L2 cache and fast  
scathered access to the RAM.

Yet such projects do not make sense unless you manage to buy in the  
cpu's dirt cheap. If we assume todays price of the A9 quadcores
will be the same price within 2 years of the cortex A15, then that  
price of $30-$40 right now for each cpu, if you buy a 100,
is really too high.

Maybe by then i'll call a manufacturer and negotiate about price for  
thousands of ARM cpu's - possibly several scientific organisations
and uni's here then want a bunch of them against buy in price,  
keeping price low for us all.

 From my viewpoint they want to make too much profit on those  
quadcore A9 arms right now.

On Nov 27, 2012, at 2:59 PM, Eugen Leitl wrote:

> On Tue, Nov 27, 2012 at 09:10:32AM +0100, Jonathan Aquilina wrote:
>> Hey guys I was looking at the hadoop page and it got me wondering.  
>> is it
>> possible to cluster together storage servers? If so how efficient  
>> would a
>> cluster of them be?
> An interesting problem would be to use reasonably powerful but
> cheap ARM SoCs in few GBytes onboard RAM and some flash
> for hybrid filesystems for each hard drive, and cluster
> them via GBit Ethernet on a very large scale.
> That would be a custom Beowulf for more storage-related
> tasks. E.g. an application I have in mind are volumetric
> datasets with e.g. 8 nm - voxels for biological systems,
> which are way too large to process in memory.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list