April 24, 2026 Generating Long Sequences with Sparse Transformers Notes Sparse attention patterns for long-range dependencies. O(n√n) attention.