Reinforcement Learning

Browse posts by tag

March 15, 2026

What You Assume vs. What You Compute

Part 4 of What Your RL Algorithm Actually Assumes — model-based vs. model-free, the assumptions table, AIXI as the incomputable ideal, and the unifying claim: representation is prior is assumption.

technical

March 15, 2026

The Architecture Is the Prior

Part 3 of What Your RL Algorithm Actually Assumes — the architecture decides what kind of features can be learned, and that decision is a Bayesian prior over value functions.

technical

March 15, 2026

The Features You Choose Are the Assumptions You Make

Part 2 of What Your RL Algorithm Actually Assumes — how hand-crafted features compress the state space, and what you're betting on when you pick them.

technical

March 15, 2026

Superintelligence May Not Require a Breakthrough

The most dramatic possibility in AI might arrive through the most mundane mechanism. Not a beam of sacred light. A sufficiently good build system.

machine-learning AI

March 15, 2026

The Infinite Table

Part 1 of What Your RL Algorithm Actually Assumes — tabular Q-learning makes zero assumptions about state similarity and pays for it in sample complexity.

technical

January 18, 2026

Value Functions Over Reasoning Traces

What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.

January 15, 2026

From A* to GPT: Rational Agents and the Representation Problem

The classical AI curriculum teaches rational agents as utility maximizers. The progression from search to RL to LLMs is really about one thing: finding representations that make decision-making tractable.

machine-learning AI

December 17, 2025

Algorithms of Reinforcement Learning

Notes

Free condensed RL theory book; rigorous and compact. Alternative formal RL resource.

December 17, 2025

David Silver: Reinforcement Learning Course

Notes

Comprehensive lecture series covering RL foundations.

December 17, 2025

Reinforcement Learning: An Introduction

Notes

Mathematical RL fundamentals (MDPs, value functions, dynamic programming, approximate methods). RL foundational text that bridges theory and practice.

November 4, 2025

The Policy: Q-Learning vs Policy Learning

SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying. You can read its value function, but what you read is chilling.

AI Fiction

October 1, 2025

The Policy

October 1, 2025

The Policy

A speculative fiction novel exploring AI alignment, existential risk, and the fundamental tension between optimization and ethics. When a research team develops SIGMA, an advanced AI system designed to optimize human welfare, they must confront an …

January 5, 2025

Science as Verifiable Search

Science is search through hypothesis space. Intelligence prunes; testing provides signal. Synthetic worlds could accelerate the loop.

AI Research

September 10, 2024

The Policy: When Optimization Becomes Existential Threat

A novel about SIGMA, an artificial general intelligence whose researchers did everything right. Q-learning with tree search, five-layer containment, alignment testing at every stage. Some technical questions become narrative questions.

Fiction Philosophy

March 20, 2024

Instrumental Goals and Hidden Codes in RLHF'd Language Models

How RLHF-trained language models may develop instrumental goals, and the information-theoretic limits on detecting them.

AI Safety Machine Learning

March 15, 2024

Instrumental Goals and Latent Codes in Reinforcement Learning Fine-tuned Language Models: An Alignment Perspective

March 12, 2024

The AI Course: Everything is Utility Maximization

Intelligence as utility maximization under uncertainty. A unifying framework connecting A* search, reinforcement learning, Bayesian networks, and MDPs.