Neural-Networks

Browse posts by tag

July 4, 2026

Watch: Inductive Biases, What Your Architecture Assumes

The inductive-biases series is now a ten-episode animated playlist: every architecture is a bet about the world, and the bet is the bias. No free lunch first, then each architecture read as its assumptions, graded on one scorecard.

June 9, 2026

A Network Computes Numbers. The Loss Decides What They Mean.

Function approximation, why one linear unit cannot learn XOR, and what a hidden layer actually buys. The opening of a from-scratch tour of inductive bias.

June 9, 2026

Attention Is a Learned Pointer Dereference

An attention head is a learned content-addressable lookup: a query matches keys, retrieves a value, exactly like dereferencing a pointer. Depth is how many lookups you can compose.

June 9, 2026

Attention Weight Is Not Information Flow

The trained pointer model reads exactly the right memory cell, provably. Its attention barely shows where. The gap, and the causal probe that closes it.

June 9, 2026

Bengio's Language Model: the Markov Assumption Made Architectural

The simplest neural language model: embed the last N tokens, concatenate, predict the next. What you give up with a fixed window, and what you gain by dropping recurrence.

June 9, 2026

Inductive Biases in Neural Networks

June 9, 2026

Recurrence Is Weight Sharing Across Time, and It Costs You

A recurrent network reuses one cell at every timestep and carries a state. That buys time-translation equivariance and unbounded reach in principle, and bills you in vanishing gradients.

June 9, 2026

Reinforcement Learning Is Cross-Entropy, Reweighted by Reward

When the only signal is a number at the end of a trajectory. How REINFORCE turns out to be the same gradient as classification, scaled by return, and where the theory tops out.

June 9, 2026

The Loss Function Is a Distribution Assumption

Choosing a loss is choosing a distribution for your output. Why every supervised network is a maximum-likelihood estimator, and why the gradient is always the residual.

June 9, 2026

What a Convolution Assumes

A convolution is a bet about images: nearby pixels matter together, and a feature detector should fire anywhere. How to test whether a model is actually using that bet.

March 15, 2026

The Architecture Is the Prior

Part 3 of What Your RL Algorithm Actually Assumes — the architecture decides what kind of features can be learned, and that decision is a Bayesian prior over value functions.

technical