[Beowulf] GPU based cluster fo machine learning purposes
Piotr Król
pietrushnic at gmail.com
Thu Apr 10 05:28:12 PDT 2014
On Thu, Apr 10, 2014 at 01:44:30AM -0400, Mark Hahn wrote:
> >I'm considering proof of concept Beowulf cluster build for machine
> >learning purposes.
>
> you can't go wrong using cheap/PC/commodity parts. you'll also get the
> easiest access to tools/distros/etc.
>
I'm concerned about cluster size I would like to keep it as small as
possible. Probably some Mini/Nano-ITX board would be good enough to beat
Jetson TK1. I wonder about price for whole setup and its comparison with
Jetson.
> >In short I need as good as possible double precision matrix
> >multiplication performance with small power consumption and size.
>
> TK1 appears to be SP-oriented (not surprisingly). it's a little unclear
> what its power dissipation is - I'd guess something in the 20W range for
> linpack.
>
> >Taking matrix multiplication into consideration I thought that GPU is
> >natural choice.
>
> well, maybe. you always save power by operating more units at lower clock,
> and GPU tends to embrace this approach. it's not like GPUs have some
> magically more efficient circuits otherwise. but it's proabably worth
> looking at the gpu-linpack performance/watt from AMD's APU options. (though
> they contain higher-performance CPU and memory support than TK1.)
>
Very good point! Following your AMD APU advice I found this article:
http://www.anandtech.com/show/7711/floating-point-peak-performance-of-kaveri-and-other-recent-amd-and-intel-chips
I will try to rethink my configuration using AMD APU + Mini/Nano-ITX
board and will see if I can get better result considering performance/price
ratio.
> >curious about your professional opinion on this build.
>
> my professional opinion is that when people use the phrase "build"
> as a noun, they're coming from the PC/gamer world ;)
>
> sorry!
:) More PC than gamer, maybe my English is not good enough.
>
> >Questions that already came to my mind:
> >1. What are the most used diagnostic software for keeping cluster up and
> >running.
>
> what failure modes are you thinking about? I use IPMI on my clusters,
> and wouldn't build a cluster without it.
>
I mean board power on failures, bad blocks, overheating and other
hardware issues.
I don't know any development board with BMC, AFAIK this typically server
component. I agree that remote management ability is very important.
> >4. Theoretical max for this platform is 326 SP GFLOPS, I was able to
> >confirm that DP/SP ratio is 1/24 so theoretical max for DP is 13 GFLOPS.
> >Can someone elaborate or point me to documentation how hard will be to
> >utilize this power assuming CUDA and MPI usage.
>
> "utilize"? it's pretty low flops, so the onboard 2G will be plenty
> to keep it busy. otoh, the memory is only 64b wide (no mention of memory
> clock I've seen), so probably fairly low-bandwidth.
>
In spec there is information about DDR3L FBGA96, 256Mbit x 16, 933MHz Hynix
H5TC4G63AFR-RDA.
> >I'm open to any suggestions, even if it means changing everything in
> >this build :)
>
> IMO, you can learn everything you need to learn from 4-8 low-end PCs.
> there are certainly power differences versus and arm+low-end-gpu board
> like this, but since this device delivers pretty much token gflops,
> you might consider just using a raspberry pi or beaglebone if you have your
> heart set on avoiding the PC market.
I considered RPi and BeagleBone. I measure performance on RPi and get 68
DP MFLOPS after overclocking. There is unleashed performance of
VideoCore IV GPU (24 SP GFLOPS) but there is no C compiler for that
(only reverse engineered assembly). BeagleBone MX seems to have about
50-60 MFLOPS according to this:
http://www.vesperix.com/arm/atlas-arm/bench/gcc-a8/index.html
So this boards are not comparable with Jetson. I will take a look at
Mini/Nano-ITX PC market.
I appreciate your reply Mark, thanks.
Regards,
Piotr Król
More information about the Beowulf
mailing list