186 Comments
⭠ Return to thread

Arguably, even AI models have "waves". It is just that with the currently dominant paradigm of transformer models there is only a single wave going from input to output and no oscillations or recurrences. This used to be different, recurrent models like LSTMs, Neural Turing Machines and other older ideas could be described as oscillating. It just turned out that these recurrent models are hard to train with current methods and to scale to large amounts of data. Interestingly, recurrent models are thought to be better at reasoning tasks and algorithmic learning, something that transformer models are bad at ("bad" here meaning in comparison to their other skills like memorization and language understanding).

Expand full comment