[Beowulf] Beowulf Cluster VS Hadoop/Spark

Fri Dec 30 08:49:34 PST 2016

Em 30-12-2016 05:47, John Hanks escreveu:
> This often gets presented as an either/or proposition and it's really
> not. We happily use SLURM to schedule the setup, run and teardown of
> spark clusters. At the end of the day it's all software, even the kernel
> and OS. The big secret of HPC is that in a job scheduler we have an
> amazingly powerful tool to manage resources. Once you are scheduling
> spark clusters, hadoop clusters, VMs as jobs, containers, long running
> web services, ...., you begin to feel sorry for those poor "cloud"
> people trapped in buzzword land.
>
> But, directly to your question what we are learning as we dive deeper
> into spark (interest in hadoop here seems to be minimal and fading) is
> that it is just as hard or maybe harder to tune for than MPI and the
> people who want to use it tend to have a far looser grasp of how to tune
> it than those using MPI. In the short term I think it is beneficial as a
> sysadmin to spend some time learning the inner squishy bits to
> compensate for that. A simple wordcount example or search can show that
> wc and grep can often outperform spark and it takes some experience to
> understand when a particular approach is the better one for a given
> problem. (Where better is measured by efficiency, not by the number of
> cool new technical toys were employed :)

A good example of what John says about command line tools
(i know, it's hadoop, but you'll get the point)

http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html