[Beowulf] GPU based cluster for machine learning purposes
deadline at eadline.org
Thu Apr 10 09:07:45 PDT 2014
Mark and Jim have provided some sound advice. As to your
approach, I want to offer a data point.
Four mid-range Intel Haswell processors connected by
GbE can provide almost 500 GFLOPS of DP CPU performance
running HPL (the Top500 benchmark) without a GPU.
Although you can buy a prebuilt box (disclosure, I sell them),
it is not that difficult to build you own system
using separate cases and some shelves. Or as Jim suggests, cookie
sheets and double stick tape.
> Hi all,
> I'm considering proof of concept Beowulf cluster build for machine
> learning purposes. My main requirements are that it should based on
> embedded development boards (relatively small power consumption and
> size). In short I need as good as possible double precision matrix
> multiplication performance with small power consumption and size.
> Taking matrix multiplication into consideration I thought that GPU is
> natural choice. Best fit in this category that I was able to find is
> brand new Jetson TK1 from NVIDIA:
> If I missed something then please let me know. I don't have access to
> code to benchmark memory and cpu consumption for these algorithms. I'm
> responsible for providing hardware and system configuration, but I'm
> curious about your professional opinion on this build.
> Questions that already came to my mind:
> 1. What are the most used diagnostic software for keeping cluster up and
> running. Is it something that I should incorporate from outside of
> standard distro (like Debian/Ubuntu) repository for this kind of build ?
> Or maybe standard tools are enough ?
> 2. Boards got size 5"x5" (12.7cmx12.7cm) I wonder where to find
> chassis/open air frame for 16, 32 or 64 nodes if I will have to extend
> my build. If you have any proposition I would be glad to hear about it.
> 3. I'm not electrical engineer but I wonder if there could be problem
> with powering up 32/64 nodes at once. There are no wattage
> characterization data for this board right now, but I saw some
> informations that this board should be sub-10W.
> 4. Theoretical max for this platform is 326 SP GFLOPS, I was able to
> confirm that DP/SP ratio is 1/24 so theoretical max for DP is 13 GFLOPS.
> Can someone elaborate or point me to documentation how hard will be to
> utilize this power assuming CUDA and MPI usage.
> 5. Operating system reside on eMMC, are there any reasons to switch to
> SD card or SSD disk (there is a SATA port on board) ?
> This was my first post to this list, so please excuse me if I introduced
> some confusion.
> I'm open to any suggestions, even if it means changing everything in
> this build :)
> Best Regards,
> Piotr Król
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> Mailscanner: Clean
More information about the Beowulf