Policy Gradient

Browse posts by tag

June 9, 2026

Reinforcement Learning Is Cross-Entropy, Reweighted by Reward

When the only signal is a number at the end of a trajectory. How REINFORCE turns out to be the same gradient as classification, scaled by return, and where the theory tops out.

April 24, 2026

Reinforcement Learning: An Introduction

Notes

The RL bible. Bandits to policy gradients to planning.