Sebastian Raschka, PhD (@rasbt): "Efficiency and performance tweaks in the transformer architecture usually focus(ed) on the normalization, attention, and FFN modules. Well, here is a New Year’s gift from DeepSeek (https://arxiv.org/abs/2512.24880). Finally some improvements of the residual path as well."

Make money doing the work you believe in

Efficiency and performance tweaks in the transformer architecture usually focus(ed) on the normalization, attention, and FFN modules.

Well, here is a New Year’s gift from DeepSeek (arxiv.org/abs/2512.24880). Finally some improvements of the residual path as well.

Jan 1

at

4:43 PM

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts

Make money doing the work you believe in

Log in or sign up