April 24, 2026 Deep Reinforcement Learning from Human Preferences Notes Foundational RLHF paper. Learning reward models from human comparisons.