[Beowulf] GPU based cluster fo machine learning purposes
pietrushnic at gmail.com
Thu Apr 10 13:30:50 PDT 2014
On Thu, Apr 10, 2014 at 02:07:40PM +0000, Lux, Jim (337C) wrote:
> On 4/10/14 5:28 AM, "Piotr Król" <pietrushnic at gmail.com> wrote:
> By the time you buy a power supply, add memory, I/o shields, vestigial
> chassis, your little $50 motherboard is now a $300 computer.
The question is if they can beat $14.7/DP GFLOPS (13GFLOPS/$192) ?
> For a prototyping cluster, small size isn't often a real driver (unless
> you're trying to pack it into a small box for some other reason: lunch box
> beowulf clusters that fit under a plane seat). Going to a more
> conventional (slightly larger) consumer oriented motherboard and an
> inexpensive consumer oriented power supply might actually give you better
> bang for the buck.
Sure, (un)fortunately size and power consumption are very important (I
think more than price - of course in reasonable boundaries). It would be
great if I could run few nodes on high-end battery in future. So for
example considering Jetson TK1 power consumption on about 20W I should
be able to run about 4 nodes with 100W UPS lithium battery. Correct me
if I introduced confusion here. For sure I have to learn a lot about
power consumption and clarify this requirements.
> Overclocking and cluster computing don't go together very well. Clusters
> are sufficiently complex beasts that you don't need the additional
> failure/flakiness/thermal management hassles that comes from overclocking.
I will remember this advice.
> > There is unleashed performance of
> >VideoCore IV GPU (24 SP GFLOPS) but there is no C compiler for that
> >(only reverse engineered assembly).
> Unless you really enjoy hacking at a very low level, you want to pick
> hardware for which YOU aren't responsible for making the OS and tools
> work. You want to spend your time on
> A) hardware assembly
> B) learning how to effectively use multiple nodes and a communications
Yes I'm really enjoying low level hacking I'm BIOS developer :P
Unfortunately I would like to avoid hobbyist approach in this project.
I'm getting hard lesson that both point A and B are most of the work
when building cluster.
> Hah.. If you want a real low power/high performance.. Consider the
> teensy3.1, a sort of super arduino using the Freescale K20 processor based
> on the ARM Cortex architecture. 30mA, runs at 72 Mhz clock rate, does a
Hah ! :) This one is really nice, unfortunately it doesn't have MMU so
not Linux and probably a lot of effort for adapting code for this
platform would be required.
More information about the Beowulf