MapReduce: Simplified Data Processing on Large Clusters

http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduce-osdi04.pdf

MapReduce: The programming model and practice

https://ai.google/research/pubs/pub36249

Google’s MapReduce programming model — Revisited

https://www.sciencedirect.com/science/article/pii/S0167642307001281

Exploring Wikipedia with Apache Spark: A Live Coding Demo

https://www.infoq.com/presentations/wikipedia-apache-spark


Besides in-memory data, Apache Spark uses ideas from functional programming (immutable data, operations on data as functional transformations…) explaining why Scala is for the moment the lingua franca language of Apache Spark and Big Data.


Matei Zaharia, Spark inventor, explains the history of Spark starting from MapReduce and Hadoop.

The reason for Big Data is Machine Learning : MapReduce -> Hadoop -> BigData(Spark) -> Machine Learning