This is probably the most honest AI architecture breakdown on the internet right now.
9-layer AI production architecture :
services/ - RAG pipeline, semantic cache, memory, query rewriter, router. Not one file. Five.
agents/ - document grader, decomposer, adaptive router. Self-correcting by design.
prompts/ - versioned, typed, registered. Never hardcoded.
security/ - input, content, output. Three guards, not one.
evaluation/ - golden dataset, offline eval, online monitor. Most people skip this entire layer and ship blind.
observability/ - per-stage tracing, feedback linked to traces, cost per query.
.claude/ - agent context so your AI coding assistant knows the codebase before it touches a file.
The demo is one file. Production is this.