FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Dao, Fu, Ermon, Rudra, Ré

paper completed ai-ml

Year 2022

Notes

IO-aware attention that is both faster and uses less memory. Essential infrastructure.