Future AGI (@futureagi): "LLM evaluation is usually assigned to one team, and that tends to create problems: too many rubrics with no clear priority, thresholds that get adjusted under launch pressure, and safety checks that go unreviewed. The issue is structural rather than technical. Evaluation is th…"

Make money doing the work you believe in

LLM evaluation is usually assigned to one team, and that tends to create problems: too many rubrics with no clear priority, thresholds that get adjusted under launch pressure, and safety checks that go unreviewed.

The issue is structural rather than technical. Evaluation is three separate jobs, and each needs its own owner: a platform team for the tooling, product teams for the rubrics, and a quality council for policy and deploy authority. Leave one out and the tooling degrades, the rubrics drift from real user needs, or there is no shared standard for what counts as good.

Our latest piece covers the split, the decision rights that make it work, how it changes with company size, and why the first step is usually forming the council. Read it here : shorturl.at/sTS9H

Jun 17

3:00 PM

Make money doing the work you believe in

Log in or sign up