A Formal Theory of Inductive Inference
Notes
Foundational paper on algorithmic probability and universal induction. Basis for AIXI.
Browse posts by tag
Foundational paper on algorithmic probability and universal induction. Basis for AIXI.
Introduced the Transformer architecture. The paper that started everything.
Bidirectional pre-training via masked language modeling. Defined the pre-train/fine-tune paradigm.
Step-by-step reasoning via prompting. Unlocked a new capability class.
Self-critique and revision using principles instead of human labels.
Foundational RLHF paper. Learning reward models from human comparisons.
Bypasses reward modeling entirely. Simpler alignment, same results.
IO-aware attention that is both faster and uses less memory. Essential infrastructure.
Sparse attention patterns for long-range dependencies. O(n√n) attention.
175B parameters. In-context learning emerges at scale. Changed the field.
Showed large LMs can perform tasks zero-shot. Introduced the scaling intuition.
Open-weight models competitive with GPT-3. Catalyzed the open-source LLM ecosystem.
Mixture of Experts with learned gating. Conditional computation at scale.
The comprehensive reference on graphical models, inference, and learning.
Interleaving reasoning traces and actions. The prompting pattern behind most LLM agents.
Power-law relationships between compute, data, parameters, and loss. Empirical scaling science.
Seminal blog post demonstrating char-level RNN power. Shakespeare, LaTeX, kernel code generation.
LMs learning when and how to call external tools. Key step toward agentic LMs.
Showed most LLMs were undertrained. Optimal ratio of data to parameters.
RLHF applied to GPT-3. The bridge from raw LM to useful assistant.