I've been thinking about this article since Chris highlighted it in Scalene: researchinformation.inf…
Gunter argues that AI has made research integrity problems (retractions, corrections, versioning, provenance) impossible to manage with today’s patchy, voluntary metadata systems, so scholarly publishing needs a neutral, nonprofit governance body that can set and enforce machine-readable integrity standards at the infrastructure level.
I’m not entirely convinced we need a new body for the “A single, authoritative registry of retractions, corrections, and expressions of concern, available via APIs and embedded directly into metadata pipelines used by publishers, indexers, and AI systems.” part. We basically have that already via Crossref, and Retraction Watch, plus new organisations are costly to get off the ground. I’m also unconvinced we want to erase flawed research from the record; it's more that we need to ensure machines don't treat this as clean, context-free truth. It would be strange, for example, to have a training data set that didn’t include Wakefield’s autism paper but did include all the subsequent commentary.
The harder, more interesting part of the article is provenance. Retractions are a status label. Provenance is a map. Understanding the provenance of an article probably isn't something we're going to get to with current LLMs.
Scalene newsletter: