April 24, 2026
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Notes
Mixture of Experts with learned gating. Conditional computation at scale.
Browse posts by tag
Mixture of Experts with learned gating. Conditional computation at scale.
In 2023 I drafted a paper on routing between a large and small LLM via KL-divergence thresholds. Speculative decoding had already solved the problem more rigorously. Here is the post-mortem.