SmolLM2 1.7B is a very good alternative to Qwen2.5 1.5B/3B and Llama 3.2.
fully open (training data, recipe, etc all to be released)
better or as good as models of similar sizes for most tasks
much smaller vocabulary => smaller activations => much cheaper fine-tuning!
The 135M version is also much better than the first iteration of SmolLM 135M but I still struggle to find applications for this model, other than for educational purposes.
Nov 3, 2024
at
5:01 PM
Log in or sign up
Join the most interesting and insightful discussions.