April 24, 2026
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Notes
IO-aware attention that is both faster and uses less memory. Essential infrastructure.
Browse posts by tag
IO-aware attention that is both faster and uses less memory. Essential infrastructure.
Sparse attention patterns for long-range dependencies. O(n√n) attention.