Note by Brad DeLong on Substack: "MAMLM: The unreasonable effectiveness of MAMLM in simulating linguistic and artistic behavior is deeply disturbing. And, given that, the failure of MAMLM in almost all reasoning tasks in which they can neither draw almost immediately on their training data nor are “Clever Hansed…"

∙

MAMLM: The unreasonable effectiveness of MAMLM in simulating linguistic and artistic behavior is deeply disturbing. And, given that, the failure of MAMLM in almost all reasoning tasks in which they can neither draw almost immediately on their training data nor are “Clever Hansed” to the extreme is also very disturbing. For example, ask for a Chicago Style manual citation to one of my weblog posts, and it fails to reason the way to the right answer:

Will Douglas Heaven: Large language models can do jaw-dropping things. But nobody knows exactly why: ‘And that's a problem…. “Obviously, we’re not completely ignorant,” says Mikhail Belkin, a computer scientist at the University of California, San Diego. “But our theoretical analysis is so far off what these models can do. Like, why can they learn language? I think this is very mysterious.”… “This is something that, until recently, we thought should not work,” says Belkin. “That means that something was fundamentally missing. It identifies a gap in our understanding of the world.” Belkin… thinks there could be a hidden mathematical pattern in language that large language models somehow come to exploit: “Pure speculation but why not? The fact that these things model language is probably one of the biggest discoveries in history,” he says. “That you can learn language by just predicting the next word with a Markov chain—that’s just shocking to me”… <technologyreview.com/2024/03/04/1089403…>

technologyreview.com

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that’s a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

Likes

∙

Restacks