[Beowulf] ***UNCHECKED*** Re: Spark, Julia, OpenMPI etc. - all in one place

Douglas Eadline deadline at eadline.org
Wed Oct 14 08:31:23 PDT 2020


> cost was one factor that accelerated spark/hadoop, it's not the only
> or even the biggest factor.  the ML folks didn't start with MPi
> because the AI frameworks were bred on workstations and then ported to
> non HPC hardware (aka cloud platforms) where MPI isn't the dominant
> paradigm.  now that ML/AI is taking hold in the HPC community for
> different aspects and the models are starting to expand beyond the 4-8
> gpus you can stick in a single box they are adding MPI underneath
> (look at horovad) to spread the models over multiple machines (scale
> out vs scale up).
>

A few things. Hadoop is platform and framework for running
Map Reduce (MR) at scale. Parallel MR is a highly
defined algorithm with highly defined data flows. Data are
sliced across server storage so the map stage can be run in
parallel. Spark works the same way only by default data are
sliced across memory on servers. Hadoop now does this
as well.

The Map stage can be anything you want it to be. MR is a SIMD
algorithm, so if your application fits the algorithm, MR
*might* be useful.

Spark does have machine learning (Spark MLib) And, not all ML is
GPUs and neural nets, there are many statistics
based libraries for ML, Scikit Learn for instance.

The basic Spark MLib libraries are scalable, however,
and use statistical methods. There are now libraries that
use neural nets as well and Spark V3 support a tight
integration with things like Tensor Flow etc.

IMO, both Hadoop and Spark did not use MPI because they had
a highly defined algorithm with specific performance goals.
Many MR jobs, like those with Hadoop are dynamic, requiring a
varied resource load over the course of their lifetime.
(Mapping uses a lot of resources, Reducing usually uses much less)

Thus, the Hadoop scheduler, YARN, can dynamically reduce or
increase the resources assigned to a running job. MPI does not
provide such a dynamic resource allocation.
Basically, MPI did not address their project goals.
The authors were certainly aware of MPI (I worked with
some of them on a book about YARN)

--
Doug




-- 
Doug



More information about the Beowulf mailing list