๐ช๐ต๐ ๐ก๐ฒ๐๐ณ๐น๐ถ๐
๐ฑ๐ฒ๐ฐ๐ถ๐ฑ๐ฒ๐ฑ ๐๐ผ ๐ฎ๐ฏ๐ฎ๐ป๐ฑ๐ผ๐ป ๐๐ต๐ฒ ๐๐ค๐ฅ๐ฆ ๐ฎ๐ฝ๐ฝ๐ฟ๐ผ๐ฎ๐ฐ๐ต
Netflix launched ๐ง๐๐ฑ๐๐บ in late 2021 as its official fan website, a destination for exclusive interviews, behind-the-scenes content, and show-related material.
The team built it using CQRS to optimize read performance for serving content to fans.
The write path utilized a third-party CMS with a dedicated ingestion service that handled content updates via webhooks.
This service transformed CMS data into a read-optimized format and published it to Kafka.
But their CQRS implementation couldn't deliver the read-after-write consistency their editorial workflow required.
Tudum launched with standard CQRS separation:
Write path: third-party CMS with webhook-driven ingestion service.
Read path: Kafka โ Cassandra โ Page Data Service with near cache โ Page Construction Service.
The ingestion pipeline transformed CMS data into read-optimized format (CDN URLs instead of asset IDs, hydrated movie metadata instead of placeholders). Kafka decoupled writes from reads, and Cassandra provided the query store.
But they have ๐ฎ ๐ฐ๐ผ๐ป๐๐ถ๐๐๐ฒ๐ป๐ฐ๐ ๐ฝ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ. Editorial previews required strong read-after-write consistency, but the architecture delivered eventual consistency with significant delays:
1. CMS webhook triggers the ingestion service
2. Ingestion service queries CMS APIs, validates, transforms, and produces to Kafka
3. Data Service Consumer processes Kafka message, writes to Cassandra
4. Near cache refreshes on 60-second cycle
With N keys and 60-second refresh intervals, the cache is updated one key per second. Growing content made this issue worse, and preview delays stretched to over 30 seconds.
So, Netflix replaced Kafka, Cassandra, and a cache layer with ๐ฅ๐๐ช ๐๐ผ๐น๐น๐ผ๐, their in-memory object database.
Key characteristics:
โ
Entire dataset is distributed across the application cluster memory
โ
Compression reduces the memory footprint to 25% of the uncompressed size
โ
Eventual consistency by default, strong consistency per-request option
โ
O(1) synchronous data access, no I/O per request
This enabled them to ๐ฎ๐ฐ๐ต๐ถ๐ฒ๐๐ฒ ๐๐ถ๐ด๐ป๐๐ถ๐ณ๐ถ๐ฐ๐ฎ๐ป๐ ๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ฎ๐ป๐ฐ๐ฒ ๐ถ๐บ๐ฝ๐ฟ๐ผ๐๐ฒ๐บ๐ฒ๐ป๐๐, such as reducing editorial previews from minutes to seconds and page construction from 1.4 seconds to 0.4 seconds.
CQRS optimizes for scale and separation of concerns. However, when your dataset fits in memory, and you require strong consistency, simpler approaches often prove more effective.
The "right" pattern depends on your specific constraints. Netflix optimized for editorial workflow, rather than theoretical scalability limits they hadn't yet reached.
Architecture decisions should align with actual requirements, not anticipated ones.
Image: Netflix