Pretraining gave language models knowledge of the world through text. Reinforcement learning shaped their behavior against verifiable signals. The ReAct loop gave them a way to act in environments. Harness engineering made the loop reliable. Inference-time reasoning thickened the thought inside each turn. Protocols let agents reach tools and each other. World models and vision-language-action models are now extending all of this into physical environments.
And yet. The summit on the diagram has not moved. The intention gap is not smaller. It is the same gap, now better characterized, examined from more angles, wrapped in more engineering, but structurally unchanged.