Hugo (@robonaissance): "Five paradigms define how language models are taught after pretraining. Imitation: show the model what to say Preference: ask which of two answers is better Outcome: reward correct final answers Process: grade each step Self: train against the model's own output…"

Make money doing the work you believe in

Five paradigms define how language models are taught after pretraining.

Imitation: show the model what to say
Preference: ask which of two answers is better
Outcome: reward correct final answers
Process: grade each step
Self: train against the model's own outputs

Each paradigm specifies less than the previous. Each leaves more to the model. By Part 5, the supervisor itself is being trained.

The Age of Post-Training, Part 1: Learning by Imitation. Live now.

This attachment is not available.

May 8

at

7:20 PM

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts