Back to Media
Training language models to follow instructions with human feedback (InstructGPT)
Ouyang, Wu, Jiang, Almeida, Wainwright, Mishkin, Zhang, et al.
Notes
RLHF applied to GPT-3. The bridge from raw LM to useful assistant.