Mixture of Experts

Browse posts by tag

April 24, 2026

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Notes

Mixture of Experts with learned gating. Conditional computation at scale.

September 15, 2025

KL-Threshold Routing Between LLMs: What Speculative Decoding Already Solved

In 2023 I drafted a paper on routing between a large and small LLM via KL-divergence thresholds. Speculative decoding had already solved the problem more rigorously. Here is the post-mortem.

Computer Science