Home
Subscriptions
Chat
Activity
Explore
Profile

The app for independent voices

Joe Bachir's avatar
Joe Bachir Mar 17
@joebachir

Inside the Forward Pass: Can Transformer Internals Predict Correctness?

TL;DR: Internal transformer signals (entropy, attention, hidden state statistics) predict generation correctness with AUROC 0.60–0.90 under grouped held-out evaluation, without looking at the output text. The first 10 generated tokens carry most of the predictive signal for code tasks. Model confidence scores are nearly uncorrelated with correctness for…
Joe Bachir
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.