Back to Media
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Shazeer, Mirhoseini, Maziarz, Davis, Le, Hinton, Dean
Notes
Mixture of Experts with learned gating. Conditional computation at scale.