April 24, 2026Generating Long Sequences with Sparse TransformersNotesSparse attention patterns for long-range dependencies. O(n√n) attention.