The name 'large language models' is doing a lot of work in AI discourse
It suggests LLMs have language. But whether that's true depends entirely on what you think language is. That question is rarely examined.
Language, in the technical sense, is not the words on a page or the sentences in a training corpus. It's a computational system of the human mind/brain, one that generates an infinite array of hierarchically structured expressions from a small set of primitive operations. That system is a biological property of human beings. LLMs don't have it
(1) LLMs are trained on the products of human language use , texts, reports, outputs, not on language itself. (2) The core computational property of human language, Merge, is entirely absent from how LLMs produce sentences (3) What LLMs do is probabilistic pattern matching over linguistic artefacts.
That is interesting, impressive, whatever we like. But it is not language