[Beowulf] Clustering vs Hadoop/spark
    Jonathan Aquilina 
    jaquilina at eagleeyet.net
       
    Tue Nov 24 08:19:52 UTC 2020
    
    
  
Hi Ben,
Readded the list
I think where im confused is that to me doesn’t that what Hadoop/Spark does distributes the data for computation then aggregates it back into a single data set?
Correct me if I am wrong here. 
Also another thing I cant seem to understand is how for big data analytics a java based platfrom manages to get some great performance to crunch large data sets.
Regards,
Jonathan
-----Original Message-----
From: Benjamin Redling <benjamin.rampe at uni-jena.de> 
Sent: 24 November 2020 09:03
To: Jonathan Aquilina <jaquilina at eagleeyet.net>
Subject: Re: [Beowulf] Clustering vs Hadoop/spark
Hello Jonathan,
On 24/11/2020 06.22, Jonathan Aquilina via Beowulf wrote:
> I am just wondering what advantages does setting up of a cluster have 
> in relation to big data analytics vs using something like Hadoop/spark?
can you distribute any application without programming against a framework?
We distribute a lot of data parallel tasks with the source code unchanged via SLURM.
Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Redling
☎  +49 3641 9 44323
    
    
More information about the Beowulf
mailing list