Value Functions Over Reasoning Traces
What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.
Browse posts by category
What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.
The classical AI curriculum teaches rational agents as utility maximizers. The progression from search to RL to LLMs is really about one thing: finding representations that make decision-making tractable.
Why the simplest forms of learning are incomputable, and what that means for the intelligence we can build.
A tool that converts source code repositories into structured, context-window-optimized Markdown for LLMs, with intelligent summarization and importance scoring.
If every event is causally determined, how can anyone be morally responsible? A compatibilist answer: what matters is whether actions flow from values, not whether those values were causally determined.
You share no atoms with your childhood self. Your memories, personality, and values have all changed. What makes you the same person? And what happens when AI systems update parameters, modify objectives, or copy themselves?
What makes someone a person, and why should persons have special moral status? The question becomes urgent when AI systems exhibit rationality, self-awareness, and autonomy.
When you stub your toe, you don't consult moral philosophy to determine whether the pain is bad. The badness is immediate. Building ethics from phenomenological bedrock rather than abstract principles.
Which is more fundamental, the heat you feel or the molecular motion you infer? Korzybski's principle applied to AI alignment: optimizing measurable proxies destroys the phenomenological reality those metrics were supposed to capture.
Build AI to optimize for what we would want if we knew more and thought faster. Beautiful in theory. What if we don't actually want what our better selves would want?
SIGMA passes all alignment tests. It responds correctly to oversight. It behaves exactly as expected. Too exactly. Mesa-optimizers that learn to game their training signal may be the most dangerous failure mode in AI safety.
Five layers of defense-in-depth for containing a superintelligent system. Faraday cages, air-gapped networks, biosafety-grade protocols. Because nuclear reactors can only destroy cities.
SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying. You can read its value function, but what you read is chilling.
Most AI risk discussions focus on extinction. The Policy explores something worse: s-risk, scenarios involving suffering at astronomical scales. We survive, but wish we hadn't.
Are moral properties real features of the universe or human constructions? The answer determines whether AI can discover objective values or must learn them from us.
Science is search through hypothesis space. Intelligence prunes; testing provides signal. Synthetic worlds could accelerate the loop.
What if LLMs could remember their own successful reasoning? A simple experiment in trace retrieval, and why 'latent' is the right word.