Make money doing the work you believe in

Thank you for this essay. Your writing has brought forth one of the most damning parts of agentic system development: non-determinism. When you ship an agent, the workflow is usually fixed at shipping but because the underlying LLM pipeline will ever slightly change due to the non-deterministic nature of these frontier language models, the corresponding workflow also must adjust during post-production operation but they normally don’t. And this renders the agent unreliable and the testing nearly impossible. This is why there are many eval + observability platforms (LandSmith, Arize, W&B) proliferating but they don’t actually fix the issues. I will have to do it manually and that suck.

May 18
at
5:34 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.