Azeem Azhar (@exponentialview): "Researchers quietly slipped AI-written briefs into the Jessup International Law Moot Court—the world’s premier student advocacy contest—and those entries earned “average to near-perfect” marks for presentation. A closer examination, however, revealed invented citations and shaky…"

Make money doing the work you believe in

Jun 19, 2025

Researchers quietly slipped AI-written briefs into the Jessup International Law Moot Court—the world’s premier student advocacy contest—and those entries earned “average to near-perfect” marks for presentation. A closer examination, however, revealed invented citations and shaky reasoning, showing that polished form can mask flimsy substance.

This isn’t a universal failure, but it happens often enough. Benchmarks like AbstentionBench show models internally registering doubt while still providing confidently phrased answers. The good news is this issue should be fixable: models need fine-tuning to explicitly express uncertainty and confidence—otherwise, we risk hiding failures behind clever rhetoric.

Jun 19

4:40 PM

Make money doing the work you believe in

Log in or sign up