The app for independent voices

Beside the size, the other thing is that most decoder models are trained on much more data now (eg the 0.5 B Qwen 2.5 model was trained on 18 trillion tokens)

That being said here are some additional comparisons to BERT models:

Apr 21
at
3:12 AM

Log in or sign up

Join the most interesting and insightful discussions.