DeepSeek V4 Flash is interesting because it does not feel benchmark-maxed.
The numbers are good. Not “delete every other model” good.
But $0.14 input and $0.28 output per 1M tokens changes the evaluation. At that price, “good enough” becomes a much bigger category.
You can run more parallel attempts. Let agents explore. Use long context without flinching.
Benchmarks measure peak capability. Cheap models change behavior.
That is the part I think people are underrating.