Benjamin Marie (@bnjmnmarie): "I evaluated 4-bit variants of GLM-4.7 Flash. All of them look safe to use, with only a small (few-percent) accuracy drop on long-context tasks, especially those that require a lot of reasoning tokens. At ~17 GB, you can run the model at full context length on a 24 GB GPU. …"

The app for independent voices

I evaluated 4-bit variants of GLM-4.7 Flash.

All of them look safe to use, with only a small (few-percent) accuracy drop on long-context tasks, especially those that require a lot of reasoning tokens.

At ~17 GB, you can run the model at full context length on a 24 GB GPU.

More results, including with reasoning disabled, here:

The Kaitchup – AI on a Budget

Run GLM-4.7 Flash on One GPU: VRAM Math, Quantization Options, and Benchmark Results

Feb 10

1:41 PM

The app for independent voices

Log in or sign up