June 9, 2026
Attention Weight Is Not Information Flow
The trained pointer model reads exactly the right memory cell, provably. Its attention barely shows where. The gap, and the causal probe that closes it.
Browse posts by tag
The trained pointer model reads exactly the right memory cell, provably. Its attention barely shows where. The gap, and the causal probe that closes it.