SmolLM2 1.7B is a very good alternative to Qwen2.5 1.5B/3B and Llama 3.2.

  • fully open (training data, recipe, etc all to be released)

  • better or as good as models of similar sizes for most tasks

  • much smaller vocabulary => smaller activations => much cheaper fine-tuning!

The 135M version is also much better than the first iteration of SmolLM 135M but I still struggle to find applications for this model, other than for educational purposes.

Nov 3
at
5:01 PM