Make money doing the work you believe in

Five paradigms define how language models are taught after pretraining.

  1. Imitation: show the model what to say

  2. Preference: ask which of two answers is better

  3. Outcome: reward correct final answers

  4. Process: grade each step

  5. Self: train against the model's own outputs

Each paradigm specifies less than the previous. Each leaves more to the model. By Part 5, the supervisor itself is being trained.

The Age of Post-Training, Part 1: Learning by Imitation. Live now.

This attachment is not available.
May 8
at
7:20 PM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.