MCTS-Reasoning: Tree Search for LLM Reasoning

December 1, 2024 3 min read Updated: January 19, 2026

I’ve been working on a project that applies Monte Carlo Tree Search (MCTS) to LLM-based reasoning. The core insight is that multi-step reasoning can be modeled as a sequential decision problem where MCTS can systematically explore different reasoning paths.

The Problem with Single-Shot Reasoning

When you ask an LLM a complex question, it generates a single response. If that response goes down a wrong path, there’s no recovery. The model commits to its initial approach and follows it to completion, even if better alternatives existed.

MCTS addresses this by building a tree of reasoning paths and using the UCB1 bandit algorithm to balance exploration of new paths with exploitation of promising ones.

How It Works

The system treats reasoning as:

States: Partial reasoning traces (what’s been written so far)
Actions: Reasoning continuations (the next step)
Terminal states: Complete solutions with final answers
Rewards: Quality assessments of final answers

Each MCTS simulation runs through four phases:

Selection: Traverse the tree using UCB1 to pick promising paths
Expansion: Add a new reasoning step via LLM generation
Rollout: Continue reasoning until reaching a terminal state
Backpropagation: Update statistics back up the tree

Tree-Building Rollouts

A key design choice is using tree-building rollouts. Unlike game-playing MCTS that uses a fast random policy without storing nodes, we add every rollout node to the tree. This preserves the full reasoning trace and allows reuse of reasoning steps in future simulations.

Terminal-Only Evaluation

The evaluator is invoked only on terminal states. Intermediate reasoning states aren’t evaluated, reducing computational cost. LLM-as-judge calls happen only when a complete answer is produced.

The Technical Report

I’ve written a formal specification that provides rigorous definitions for all components:

Formal definitions of states, actions, nodes, and the search tree
Precise pseudocode for all four MCTS phases
Clear interfaces for Generator and Evaluator components
Complexity analysis showing O(KD) tree operations for K simulations with max depth D

The goal was to establish a canonical reference that authentically captures MCTS while adapting it for LLM reasoning.

Usage

The library provides both a fluent API and an interactive shell:

from mcts_reasoning import ReasoningMCTS, get_llm

mcts = (
    ReasoningMCTS()
    .with_llm(get_llm("anthropic"))
    .with_question("What is the optimal sorting algorithm for 1M integers?")
    .with_exploration(1.414)
    .with_max_rollout_depth(5)
)

mcts.search("Let's think step by step...", simulations=50)
print(f"Solution: {mcts.solution}")
print(f"Confidence: {mcts.best_value:.2%}")

Or interactively:

mcts-shell
> ask What is the sum of primes less than 100?
> search 50
> solution
> sample 5  # Get diverse reasoning paths
> consistency 20  # Check solution consistency

Extensions

The tech report discusses several extensions under consideration:

Extended action spaces: Compress (for long traces), Verify, ToolCall, Backtrack
Algorithm variants: Progressive widening, RAVE, parallel MCTS
Graph-based reasoning: DAG structures for problem decomposition and critique-revision cycles

The code is available at github.com/queelius/mcts-reasoning and the technical report is on the papers page.

The Problem with Single-Shot Reasoning

How It Works

Tree-Building Rollouts

Terminal-Only Evaluation

The Technical Report

Usage

Extensions

Related Posts

Latent Reasoning Traces: Memory as Learned Prior

Value Functions Over Reasoning Traces

Compositional Prompting for LLM Reasoning: A Monte Carlo Tree Search Framework

DreamLog: Logic Programming That Dreams to Improve Itself

Cluster-Aware Retrieval for RAG Systems

Discussion