All of them look safe to use, with only a small (few-percent) accuracy drop on long-context tasks, especially those that require a lot of reasoning tokens.
At ~17 GB, you can run the model at full context length on a 24 GB GPU.
More results, including with reasoning disabled, here: