Inductive Bias

Browse posts by tag

Attention Is a Learned Pointer Dereference

An attention head is a learned content-addressable lookup: a query matches keys, retrieves a value, exactly like dereferencing a pointer. Depth is how many lookups you can compose.

Attention Weight Is Not Information Flow

The trained pointer model reads exactly the right memory cell, provably. Its attention barely shows where. The gap, and the causal probe that closes it.

Inductive Biases in Neural Networks

What a Convolution Assumes

A convolution is a bet about images: nearby pixels matter together, and a feature detector should fire anywhere. How to test whether a model is actually using that bet.

What You Assume vs. What You Compute

Part 4 of What Your RL Algorithm Actually Assumes — model-based vs. model-free, the assumptions table, AIXI as the incomputable ideal, and the unifying claim: representation is prior is assumption.

technical

The Architecture Is the Prior

Part 3 of What Your RL Algorithm Actually Assumes — the architecture decides what kind of features can be learned, and that decision is a Bayesian prior over value functions.

technical