April 24, 2026Deep Reinforcement Learning from Human PreferencesNotesFoundational RLHF paper. Learning reward models from human comparisons.