Reverse-Process Synthetic Data Generation for Math Reasoning
Training LLMs on mathematical reasoning by inverting easy-to-solve problems: generate derivatives, reverse them into integration exercises with full step-by-step solutions.
Browse posts by tag
Training LLMs on mathematical reasoning by inverting easy-to-solve problems: generate derivatives, reverse them into integration exercises with full step-by-step solutions.
Controlled experiments on constraint satisfaction problems. MCTS beats best-of-N only when blind sampling hits a ceiling and the verifier provides a gradient.
The most dramatic possibility in AI might arrive through the most mundane mechanism. Not a beam of sacred light. A sufficiently good build system.
What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.
Applying Monte Carlo Tree Search to large language model reasoning, with a formal specification of the algorithm.
What if LLMs could remember their own successful reasoning? A simple experiment in trace retrieval, and why 'latent' is the right word.