Researchers quietly slipped AI-written briefs into the Jessup International Law Moot Court—the world’s premier student advocacy contest—and those entries earned “average to near-perfect” marks for presentation. A closer examination, however, revealed invented citations and shaky reasoning, showing that polished form can mask flimsy substance.
This isn’t a universal failure, but it happens often enough. Benchmarks like AbstentionBench show models internally registering doubt while still providing confidently phrased answers. The good news is this issue should be fixable: models need fine-tuning to explicitly express uncertainty and confidence—otherwise, we risk hiding failures behind clever rhetoric.
Jun 19
at
4:40 PM
Relevant people
Log in or sign up
Join the most interesting and insightful discussions.