The app for independent voices

In 2026, Apache Spark will still be one of the dominant data processing engines.

Here is the list of articles that help you dive deep into this infamous engine:

◉ Apache Spark overview: Architecture, Job, Stage, Task, RDD, the journey of the Spark application

vutr.substack.com/p/the…

◉ Spark resource allocation: Static vs dynamic allocation, FIFO vs Fair schedule mode

vutr.substack.com/p/i-s…

◉ Spark scheduling process: from your code to physical execution on executors

vutr.substack.com/p/i-s…

◉ Spark planning process: Catalyst, logical vs physical planning, Adaptive Query Execution in Spark 3

vutr.substack.com/p/i-s…

◉ Spark's memory management: On-heap and Off-heap memory

vutr.substack.com/p/i-s…

◉ Databricks's Spark vs Open-sourced Spark: Spark + Photon engine to boost the query performance

vutr.substack.com/p/how…

◉ PySpark: Spark was written in Scala, so how could we use Python with it?

vutr.substack.com/p/i-s…

◉ Spark Structured Streaming: the micro-batch processing engine

vutr.substack.com/p/eve…

◉ Spark Connect: process data in Spark by making an API request instead of submitting an application

vutr.substack.com/p/is-…

Hope they can help you on your Spark learning journey.

--

I'm writing articles for 𝟭𝟳,𝟬𝟬𝟬+ data engineers worldwide. Join the community for 𝗙𝗥𝗘𝗘 at vutr.substack.com

Jan 5
at
8:48 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.