Reverse-Process Synthetic Data Generation for Math Reasoning
Training LLMs on mathematical reasoning by inverting easy-to-solve problems: generate derivatives, reverse them into integration exercises with full step-by-step solutions.
Browse posts by tag
Training LLMs on mathematical reasoning by inverting easy-to-solve problems: generate derivatives, reverse them into integration exercises with full step-by-step solutions.
Part 4 of What Your RL Algorithm Actually Assumes — model-based vs. model-free, the assumptions table, AIXI as the incomputable ideal, and the unifying claim: representation is prior is assumption.
Part 3 of What Your RL Algorithm Actually Assumes — the architecture decides what kind of features can be learned, and that decision is a Bayesian prior over value functions.
Part 2 of What Your RL Algorithm Actually Assumes — how hand-crafted features compress the state space, and what you're betting on when you pick them.
Part 1 of What Your RL Algorithm Actually Assumes — tabular Q-learning makes zero assumptions about state similarity and pays for it in sample complexity.
What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.
The classical AI curriculum teaches rational agents as utility maximizers. The progression from search to RL to LLMs is really about one thing: finding representations that make decision-making tractable.
Why the simplest forms of learning are incomputable, and what that means for the intelligence we can build.
Modern graduate ML text with causal inference, decision making, and ML foundations. Accessible free textbook with strong conceptual framing.
SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying. You can read its value function, but what you read is chilling.
A logic programming system that alternates between wake and sleep phases, using LLMs for knowledge generation during wake and compression-based learning during sleep.
Learning fuzzy membership functions and inference rules automatically through gradient descent on soft circuits, instead of hand-crafting them.
Three approaches to computing derivatives, forward-mode AD, reverse-mode AD, and finite differences, each with different trade-offs for numerical computing and machine learning.
Science is search through hypothesis space. Intelligence prunes; testing provides signal. Synthetic worlds could accelerate the loop.
Applying Monte Carlo Tree Search to large language model reasoning, with a formal specification of the algorithm.
Using GMM clustering to improve retrieval in topically diverse knowledge bases
What if LLMs could remember their own successful reasoning? A simple experiment in trace retrieval, and why 'latent' is the right word.
What if fuzzy logic systems could discover their own rules? An interactive demo of differentiable fuzzy circuits that learn membership functions, rule structure, and rule existence, all via gradient descent.
Solomonoff induction, MDL, speed priors, and neural networks are all special cases of one Bayesian framework with four knobs.
Gradient descent in Euclidean space ignores the geometry of probability distributions. Natural gradient descent uses the Fisher information metric instead. Fisher Flow makes this continuous.
A tiny autodiff library for understanding how backpropagation actually works.
Intelligence as utility maximization under uncertainty. A unifying framework connecting A* search, reinforcement learning, Bayesian networks, and MDPs.
Abstractions let us reason about complex systems despite our cognitive limits. But some systems resist compression entirely.
How the limited capacity of human working memory acts as regularization, shaping our reasoning and possibly preventing cognitive overfitting.
Reverse-mode automatic differentiation is just the chain rule applied systematically. I built one in C++20 to understand what PyTorch and JAX are actually doing.
Encountering ChatGPT during cancer treatment and recognizing the Solomonoff connection. Language models as compression, prediction as intelligence. A personal inflection point reconnecting with AI research after years in survival mode.
Dual numbers extend our number system with an infinitesimal epsilon where epsilon^2 = 0. Evaluating f(x + epsilon) yields f(x) + epsilon * f'(x)—the derivative emerges automatically from the algebra.
The problem of predicting what comes next, from compression to language models