Five paradigms define how language models are taught after pretraining.
Imitation: show the model what to say
Preference: ask which of two answers is better
Outcome: reward correct final answers
Process: grade each step
Self: train against the model's own outputs
Each paradigm specifies less than the previous. Each leaves more to the model. By Part 5, the supervisor itself is being trained.
The Age of Post-Training, Part 1: Learning by Imitation. Live now.