Cloud Academy – Running Spark on Azure Databricks-STM
English | Size: 575.35 MB
Category: Tutorial
Apache Spark is an open-source framework for doing big data processing. It was developed as a replacement for Apache Hadoop’s MapReduce framework. Both Spark and MapReduce process data on compute clusters, but one of Spark’s big advantages is that it does in-memory processing, which can be orders of magnitude faster than the disk-based processing that MapReduce uses. There are plenty of other differences between the two systems, as well, but we don’t need to go into the details here.