If you want to learn data engineering in 2026, read these blog posts:
◉ Data Modeling
- Kimball Overview: lnkd.in/gYSTUUDt
- Kimball + dbt: lnkd.in/gKGeiXSP
- SCD: lnkd.in/gcBcrhtd
◉ SQL: lnkd.in/g9Z7RTnA
◉ OLAP
- Insights help you learn: lnkd.in/gBye2W-b
- OLTP vs OLAP, Making changes to the data: lnkd.in/gUBRrNiY
- OLTP vs OLAP, Data Format and Indexing: lnkd.in/gbgcQ3eX
- Partitioning vs Clustering: lnkd.in/d46eyiX7
- How do databases execute joins: lnkd.in/eyPR_JJ4
- Internal of BigQuery, Snowflake, Databricks, Redshift: lnkd.in/gcykRFac
- Internal of DuckDB: lnkd.in/eE9MAeA9
- Why we need open table formats (OTFs): lnkd.in/gBW73TFB
- How OTFs ensure ACID: lnkd.in/dKMwwZR6
- Considers when building LakeHouse: lnkd.in/gmGsXv3i
- Iceberg: lnkd.in/gbuMy8TT
- Delta Lake: lnkd.in/gumU4XcG
- Hudi: lnkd.in/g4X7QSck
◉ dbt: lnkd.in/gBg4Z8e7
◉ Data formats
- CSV, JSON, Avro, Parquet: lnkd.in/gnUssk82
- Parquet deep dive: lnkd.in/gAwdG3pd
- Apache Arrow deep dive: lnkd.in/gsZk6F2R
◉ Spark
- All fundamentals: lnkd.in/g8bzn5Fk
- Overview: lnkd.in/gxrkaAk3
- Resource allocation: lnkd.in/ga-wX-j6
- Scheduling process: lnkd.in/g7VrtHv3
- Planning: lnkd.in/gcCUpKev
- Memory management: lnkd.in/gPpyFK7j
- Databricks Spark vs Open-sourced Spark: lnkd.in/gSN7Vk33
- PySpark: lnkd.in/gwA-BWdp
- Spark Streaming: lnkd.in/gQMywJsz
- Spark Connect: lnkd.in/gfXsbQ35
◉ Single-node processing engine (e.g., Polars)
- Why single node engine: lnkd.in/g3cPTnXx
◉ Airflow
- Overview: lnkd.in/dVqxe3yF
- Executors: lnkd.in/dp5cfsqs
◉ Git: lnkd.in/gAGFuckM
◉ Docker: lnkd.in/gPsWUVZN
◉ Kubernetes
- Overview: lnkd.in/g58Y2gRR
- k8s + Spark: lnkd.in/gJ_dUu9j
◉ Kafka: lnkd.in/gSjKHWwW
◉ Flink
- Overview: lnkd.in/gvfTwMiu
- What makes Flink fast: lnkd.in/gni3yxcx
◉ AI
- LLMs: lnkd.in/gxWS5ZNK
- Vector DB: lnkd.in/g-GgqRzM
◉ Side projects
- Lakehouse on laptop: lnkd.in/gqQ4FaBT
(To be updated...)
--
I'm writing articles for 𝟭𝟳,𝟬𝟬𝟬+ data engineers worldwide. Join the community with 𝟱𝟬% 𝗱𝗶𝘀𝗰𝗼𝘂𝗻𝘁 𝗼𝗻 𝘁𝗵𝗲 𝗮𝗻𝗻𝘂𝗮𝗹 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻 now: