Back to Media

Generating Long Sequences with Sparse Transformers

Child, Gray, Radford, Sutskever
paper completed ai-ml

Notes

Sparse attention patterns for long-range dependencies. O(n√n) attention.