I have been building and operating Agentic AI Systems for the past few years and the same patterns keep emerging. ๐
๐๐๐ฎ๐น๐๐ฎ๐๐ถ๐ผ๐ป ๐๐ฟ๐ถ๐๐ฒ๐ป ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐ is the most reliable way to be successful in building your ๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐ฆ๐๐๐๐ฒ๐บ๐ and continue improving them - here is my template.
Letโs zoom in:
๐ญ. Define a problem you want to solve: is GenAI even needed?
๐ฎ. Build a Prototype: figure out if the solution is feasible.
๐ฏ. Define Performance Metrics: you must have output metrics defined for how you will measure success of your application.
๐ฐ. Define Evals: split the above into smaller input metrics that can move the key metrics forward. Decompose them into tasks that could be automated and move the given input metrics. Define Evals for each. Store the Evals in your Observability Platform.
โน๏ธ Steps ๐ญ. - ๐ฐ. are where AI Product Managers can help, but can also be handled by AI Engineers.
๐ฑ. Build a PoC: it can be simple (excel sheet) or more complex (user facing UI). Regardless of what it is, expose it to the users for feedback as soon as possible.
๐ฒ. Instrument your application: gather traces and human feedback and store it in an Observability Platform next to previously stored Evals.
๐ณ. Run Evals on traced data: traces contain inputs and outputs of your application, run evals on top of them.
๐ด. Analyse Failing Evals and negative user feedback: this data is gold as it specifically pinpoints where the Agentic System needs improvement.
๐ต. Use data from the previous step to improve your application - prompt engineer, improve AI system topology, finetune models etc. Make sure that the changes move Evals into the right direction.
๐ญ๐ฌ. Build and expose the improved application to the users.
๐ญ๐ญ. Monitor the application in production: this comes out of the box - you have implemented evaluations and traces for development purposes, they can be reused for monitoring. Configure specific alerting thresholds and enjoy the peace of mind.
โ
๐๐ผ๐ป๐๐ถ๐ป๐๐ผ๐๐ ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐ ๐ผ๐ณ ๐๐ผ๐๐ฟ ๐ฎ๐ฝ๐ฝ๐น๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป:
โก๏ธ Run steps ๐ฒ. - ๐ญ๐ฌ. to continuously improve and evolve your application.
โก๏ธ As you build up in complexity, new requirements can be added to the same application, this includes running steps ๐ญ. - ๐ฑ. and attaching the new logic as routes to your Agentic System.
โก๏ธ You start off with a simple Chatbot and add a route that can classify user intent to take action (e.g. add items to a shopping cart).
Learn all of the practices of Eval Driven Development Hands-on in my End-to-end AI Engineering Bootcamp:
๐ Grab your 15% discount by applying code KICKOFF15 at the check-out.
What is your experience in evolving Agentic Systems? Let me know in the comments ๐