Back to Media

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafailov, Sharma, Mitchell, Ermon, Manning, Finn
paper completed ai-ml

Notes

Bypasses reward modeling entirely. Simpler alignment, same results.