Sunny (@sannikpatel): "Interesting, OpenAI added learnable bias to attention logits before softmax. After softmax, they deleted the bias. I think in this way, you can pre-train your LLMs without large activation outliers, which I think is the secret for such easy quantization they were able to do with…"

The app for independent voices

Interesting, OpenAI added learnable bias to attention logits before softmax. After softmax, they deleted the bias. I think in this way, you can pre-train your LLMs without large activation outliers, which I think is the secret for such easy quantization they were able to do with gpt-oss models.

Aug 5

9:06 PM

The app for independent voices

Log in or sign up