April 24, 2026
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Notes
Mixture of Experts with learned gating. Conditional computation at scale.
Browse posts by tag
Mixture of Experts with learned gating. Conditional computation at scale.