June 9, 2026
Attention Weight Is Not Information Flow
The trained pointer model reads exactly the right memory cell, provably. Its attention barely shows where. The gap, and the causal probe that closes it.
Browse posts by tag
The trained pointer model reads exactly the right memory cell, provably. Its attention barely shows where. The gap, and the causal probe that closes it.
175B parameters. In-context learning emerges at scale. Changed the field.