Most AI agents fail in production — not because the prompt was wrong, but because nobody did the hard work. Robust systems demand rigorous evals, controlled experiments, and serious engineering discipline. Cleverness gets you a demo. Data science gets you reliability.