Neural Networks

Browse posts by tag

Attention Is a Learned Pointer Dereference

An attention head is a learned content-addressable lookup: a query matches keys, retrieves a value, exactly like dereferencing a pointer. Depth is how many lookups you can compose.

Attention Weight Is Not Information Flow

The trained pointer model reads exactly the right memory cell, provably. Its attention barely shows where. The gap, and the causal probe that closes it.

Inductive Biases in Neural Networks

What a Convolution Assumes

A convolution is a bet about images: nearby pixels matter together, and a feature detector should fire anywhere. How to test whether a model is actually using that bet.

The Architecture Is the Prior

Part 3 of What Your RL Algorithm Actually Assumes — the architecture decides what kind of features can be learned, and that decision is a Bayesian prior over value functions.

technical

Dive into Deep Learning

Notes

Free open-source deep learning book with code and math integrated. Interactive deep learning resource with runnable code.