Every day, 100+ people ask me, "How can I learn AI evals?"
I copy-paste these 10 links (every time):
Using LLM-as-a-judge: hamel.dev/blog/posts/ll…
Demystifying evals for AI agents: anthropic.com/engineeri…
There are only 6 RAG Evals: jxnl.co/writing/2025/05…
Evaluation-driven development: decodingai.com/p/stop-l…
Binary evals vs. Likert scales: decodingai.com/p/the-5-…
The mirage of generic AI metrics: decodingai.com/p/the-mi…
Error analysis: youtube.com/watch?v=e2i…
Carrying out error analysis: youtube.com/watch?v=JoA…
Evaluating the effectiveness of LLM-evaluators: eugeneyan.com/writing/l…
LLM judges aren't the shortcut you think: youtube.com/watch?v=sEM…
Binge these to skyrocket your skills.