June 9, 2026
Attention Is a Learned Pointer Dereference
An attention head is a learned content-addressable lookup: a query matches keys, retrieves a value, exactly like dereferencing a pointer. Depth is how many lookups you can compose.
Browse posts by tag
An attention head is a learned content-addressable lookup: a query matches keys, retrieves a value, exactly like dereferencing a pointer. Depth is how many lookups you can compose.
Introduced the Transformer architecture. The paper that started everything.
IO-aware attention that is both faster and uses less memory. Essential infrastructure.
The evolution of neural sequence prediction, and how it connects to classical methods