Watch an LLM Think

March 16, 2026 3 min read Updated: March 20, 2026

How Monte Carlo Tree Search turns single-pass reasoning into a branching exploration of ideas.

An Explorable Explanation

Watch an LLM Think

How Monte Carlo Tree Search turns single-pass reasoning into a branching exploration of ideas

▼

One Shot, One Chance

Here is a classic logic puzzle:

A says: "B is a knave."

B says: "We are the same type."

Is A a knight or a knave?

(Knights always tell the truth. Knaves always lie.)

Let's ask an LLM to solve it:

The LLM assumed A is a knave and never tested the alternative. It reached a confident answer without checking all the constraints.

What if the LLM could explore both assumptions simultaneously, and backtrack from the one that leads to contradiction?

The Tree Appears

Instead of committing to a single reasoning path, MCTS explores multiple paths simultaneously. It builds a tree where each branch represents a different line of reasoning.

But how does it decide which branch to explore next? It uses a formula called UCB1:

UCB1 = value + c · √(ln(parent_visits) / visits)

The first term (value) measures how promising a path has been so far. The second term (exploration) favors paths that haven't been tried much.

The constant c controls the balance. Try adjusting it:

exploration constant (c)

0 (greedy) √2 4 (explore)

Branching Thoughts

Once UCB1 selects a node, the LLM generates a new reasoning step from that point. The tree grows.

Branch A: "Let's assume A is a knight..."

Branch B: "Let's assume A is a knave..."

Each branch represents a different assumption the LLM is testing. This is structured exploration, not random guessing.

Click any node in the tree to see its full reasoning.

Following the Thread

From the expanded node, the LLM keeps reasoning until it reaches an answer or hits a maximum depth. This is called a rollout.

Watch the nodes appear one by one as the LLM follows its reasoning to a conclusion.

How is this different from regular MCTS?

In classical MCTS, rollout nodes are simulated but discarded. Only the evaluation score is kept. Here, every reasoning step is preserved in the tree.

This means we keep a complete record of every reasoning trace the LLM explored.

What's a Good Answer?

The rollout reaches an answer, but how do we score it? The evaluator checks whether the reasoning is logically consistent with the puzzle's constraints.

A correct derivation with no contradictions scores 1.0. A derivation that hits a contradiction scores 0.0. Weak reasoning that reaches the right answer by luck scores somewhere in between.

Scores Flow Upward

When a reasoning path reaches a conclusion and gets scored, the result propagates back up the tree. Each ancestor node updates its average value and visit count.

Good answers make nearby paths more attractive for future exploration. The tree learns where to search next.

Many Paths, One Answer

Which paths should we look at?

After 20 simulations, the tree has explored many reasoning paths. But which ones matter? Different sampling strategies highlight different aspects:

Showing paths with the highest average values.

How do we pick the final answer?

With multiple paths reaching conclusions, we can use voting to decide:

When multiple independent reasoning paths agree, we can be more confident in the answer.

Try It Yourself

This section requires a local Ollama instance to run live MCTS searches.

Connect Ollama at localhost:11434 to try this.

Click a node to inspect it.

Every expression is a tree. (2 + 3) * x looks like this:

    *            In Python: ('*', ('+', 2, 3), …

One Shot, One Chance

The Tree Appears

Branching Thoughts

Following the Thread

What's a Good Answer?

Scores Flow Upward

Many Paths, One Answer

Which paths should we look at?

How do we pick the final answer?

Try It Yourself

Related Posts

I Spent $0.48 to Find Out When MCTS Actually Works for LLM Reasoning

How to Teach a Computer Calculus with Pattern Matching

Terminal 101: Learn Linux in Your Browser

Fuzzy Inference: Teaching Machines to Think in Shades of Grey

Facts and Degrees

Dive into Deep Learning

Notes

Discussion