Back to Media
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Dao, Fu, Ermon, Rudra, Ré
Notes
IO-aware attention that is both faster and uses less memory. Essential infrastructure.