Training language models to follow instructions with human feedback (InstructGPT)

Ouyang, Wu, Jiang, Almeida, Wainwright, Mishkin, Zhang, et al.

paper completed ai-ml

Year 2022

External Link https://arxiv.org/pdf/2203.02155

RLHF instruction following alignment from:language-models

Notes

RLHF applied to GPT-3. The bridge from raw LM to useful assistant.

View Resource All Media More in Ai-Ml