SmolLM2 1.7B is a very good alternative to Qwen2.5 1.5B/3B and Llama 3.2.
fully open (training data, recipe, etc all to be released)
better or as good as models of similar sizes for most tasks
much smaller vocabulary => smaller activations => much cheaper fine-tuning!
The 135M version is also much better than the first iteration of SmolLM 135M but I still struggle to find applications for this model, other than for educational purposes.