In 2026, Apache Spark will still be one of the dominant data processing engines.
Here is the list of articles that help you dive deep into this infamous engine:
◉ Apache Spark overview: Architecture, Job, Stage, Task, RDD, the journey of the Spark application
vutr.substack.com/p/the…
◉ Spark resource allocation: Static vs dynamic allocation, FIFO vs Fair schedule mode
vutr.substack.com/p/i-s…
◉ Spark scheduling process: from your code to physical execution on executors
vutr.substack.com/p/i-s…
◉ Spark planning process: Catalyst, logical vs physical planning, Adaptive Query Execution in Spark 3
vutr.substack.com/p/i-s…
◉ Spark's memory management: On-heap and Off-heap memory
vutr.substack.com/p/i-s…
◉ Databricks's Spark vs Open-sourced Spark: Spark + Photon engine to boost the query performance
vutr.substack.com/p/how…
◉ PySpark: Spark was written in Scala, so how could we use Python with it?
vutr.substack.com/p/i-s…
◉ Spark Structured Streaming: the micro-batch processing engine
vutr.substack.com/p/eve…
◉ Spark Connect: process data in Spark by making an API request instead of submitting an application
vutr.substack.com/p/is-…
Hope they can help you on your Spark learning journey.
--
I'm writing articles for 𝟭𝟳,𝟬𝟬𝟬+ data engineers worldwide. Join the community for 𝗙𝗥𝗘𝗘 at vutr.substack.com