The inductive-biases series is now a playlist. Ten episodes, animated and narrated, built from the same posts and the same code.
The series starts from one claim: every learner that generalizes is making a bet about the world, and the bet is the bias. It opens with the theory that makes the bet mandatory. There is no free lunch: averaged over all possible worlds, no learning algorithm beats guessing, so any method that wins somewhere has to lose somewhere else, and the wins come only from assumptions that happen to match the world you are actually in. That is the whole game. You do not get generalization for free; you buy it with a prior, and you pay for it out of distribution.
From there the series reads each major neural architecture as a bundle of assumptions. What the loss function presumes about noise. What a convolution presumes about space. What Bengio’s language model presumes about time, what recurrence and attention presume about memory, and what the policy gradient presumes about reward. Every architecture episode fills in the same scorecard: the bias it hardwires, the sample-efficiency win when that bias is true, and the bill that comes due when it is not. By the end the scorecard is a habit, and you find yourself reading any new model the same way, by asking what it assumes before asking what it does.
It stays sharp where the sources are sharp. Attention weights are not information flow, however much the heatmaps invite you to read them that way. The loss function is a distribution assumption whether you chose it deliberately or reached for the default. The series is a dialogue, and the student voices the conventional view sincerely, the one most people actually hold, so the teacher can take it apart with the real argument rather than a strawman.
The theoretical spine draws on my book On Intelligence and Its Specifications, specifically its account of why generalization is possible at all and what it costs, and the code on screen is real, taken from scratchnn, a neural network library written in pure Python so you can read every line. This series is about meaning, not mechanism: how the gradients actually flow is its own from-scratch series, coming later. Here the question is always what the architecture assumes.
The playlist
Open the playlist on YouTube (10 episodes)
Episode list
- Learning Without Assumptions Is Impossible
- Sample Efficiency and the Price of Priors
- A Network Computes Numbers. The Loss Decides What They Mean.
- The Loss Function Is a Distribution Assumption
- What a Convolution Assumes
- Bengio's Language Model: The Markov Assumption Made Architectural
- Recurrence Is Weight Sharing Across Time, and It Costs You
- Attention Is a Learned Pointer Dereference
- Attention Weight Is Not Information Flow
- Reinforcement Learning Is Cross-Entropy, Reweighted by Reward
Discussion