January 18, 2026
Value Functions Over Reasoning Traces
What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.
Browse posts by tag
What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.
Why the simplest forms of learning are incomputable, and what that means for the intelligence we can build.