Sebastian Raschka, PhD (@rasbt): "Beside the size, the other thing is that most decoder models are trained on much more data now (eg the 0.5 B Qwen 2.5 model was trained on 18 trillion tokens) That being said here are some additional comparisons to BERT models: https://github.com/rasbt/LLMs-from-scratch/tree/m…"

The app for independent voices

Beside the size, the other thing is that most decoder models are trained on much more data now (eg the 0.5 B Qwen 2.5 model was trained on 18 trillion tokens)

That being said here are some additional comparisons to BERT models:

github.com

LLMs-from-scratch/ch06/03_bonus_imdb-classification at main · rasbt/LLMs-from-scratch

Apr 21

3:12 AM

The app for independent voices

Log in or sign up