[Beowulf] GPU based cluster fo machine learning purposes
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Thu Apr 10 06:35:15 PDT 2014
On 4/9/14 3:54 PM, "Piotr Król" <pietrushnic at gmail.com> wrote:
>I'm considering proof of concept Beowulf cluster build for machine
>learning purposes. My main requirements are that it should based on
>embedded development boards (relatively small power consumption and
>size). In short I need as good as possible double precision matrix
>multiplication performance with small power consumption and size.
Always nice... Bear in mind that for raw computational horsepower one big
fast computer usually beats a stack of smaller ones. It's when you get to
problems that are bigger than the one big fast computer can do that
clustering is worthwhile.
>Taking matrix multiplication into consideration I thought that GPU is
Maybe maybe not. There's also an overhead of moving data in and out of
the GPU, and whether there are optimized libraries to take advantage of
the GPU. It also kind of depends on whether you are doing this to get
"most computation for my budget" or "demonstrate feasibiity for future
scale up" or whatever.
>2. Boards got size 5"x5" (12.7cmx12.7cm) I wonder where to find
>chassis/open air frame for 16, 32 or 64 nodes if I will have to extend
>my build. If you have any proposition I would be glad to hear about it.
Nylon cable ties and threaded rods/nuts are one possibility.
I'm a big fan of double stick foam tape and large baking sheets. The
sheets are cheap and strong, and have inexpensive racks on which they can
Cooling a bunch of boards with one big fan is generally easier and quieter
than trying to pack everything into a tiny space and use individual fans.
Cooling will be your single biggest problem after cabling complexity.
>3. I'm not electrical engineer but I wonder if there could be problem
>with powering up 32/64 nodes at once. There are no wattage
>characterization data for this board right now, but I saw some
>informations that this board should be sub-10W.
That depends more on your power supplies. Do these boards have on board
DC/DC converters? Or do they run off a standard PC power supply with +5,
Bear in mind that 100 wall warts is a packaging challenge.
>4. Theoretical max for this platform is 326 SP GFLOPS, I was able to
>confirm that DP/SP ratio is 1/24 so theoretical max for DP is 13 GFLOPS.
>Can someone elaborate or point me to documentation how hard will be to
>utilize this power assuming CUDA and MPI usage.
>5. Operating system reside on eMMC, are there any reasons to switch to
>SD card or SSD disk (there is a SATA port on board) ?
What is your node-node interconnect fabric? GigE and a multi port switch?
For your task, is it going to be communication bound or compute bound?
If you don't know.. Build a small cluster, use commodity ethernet as the
interconnect, and give it a try.
You can learn a whole lot from building a 5-10 node cluster. Lots of
things like packaging, interconnects, cables, etc.
BTW, on "small cheap node" clusters, cabling can actually wind up being a
significant fraction of the total cost. 100 patch cords, 100 power cords,
100 of this, 100 of that, etc. And some bargain basement cables are more
trouble than they are worth. I got a box of 100 surplus Cat 5 cables for
something like $0.50 each and thought I had scored big time. Not at all.
Just crummy enough that I fought network problems for a couple of weeks
(not realizing they were network problems.. Small embedded computers often
don't have good diagnostic capability.. Hmm what can I figure out using
Also stuff like sharing a video monitor, or serial console port, etc.
While having 3 or 4 monitors and keyboards on your desk is reasonable for
a 4 node cluster, when you get to 10s of nodes, something else is
necessary. And ssh only works if the network connection is working, so
it's fine when you've got the cluster up and running, but not so hot
during the initial build up.
More information about the Beowulf