Built Claude Code skills for months. Zero proof they worked. Half the time Claude didn't even use them.
Anthropic's Skill Creator fixes this. Evals, A/B testing between versions, pass rate tracking, and trigger optimization so skills fire when they should.
Now you'll know if a skill helps or hurts performance as the model improves. And for multi-step workflows, you verify each step ran.