Back to Media

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Dao, Fu, Ermon, Rudra, Ré
paper completed ai-ml

Notes

IO-aware attention that is both faster and uses less memory. Essential infrastructure.