The-Learning-Problem

The AI Course: Everything is Utility Maximization

March 12, 2024

I took an AI course this semester. The material wasn’t new to me individually, but the unifying framework was the real payoff.

The organizing principle: intelligence is utility maximization under uncertainty.

This single idea connects everything from A* search to reinforcement learning to Bayesian networks.

Classical Search as Utility

We started with basic search algorithms:

Depth-first search: Minimize memory while exploring. Breadth-first search: Guarantee shortest path discovery. A search*: Minimize total cost using heuristics.

These aren’t just algorithms. They’re optimization strategies for different utility functions. A* is provably optimal when your heuristic is admissible: it maximizes progress toward the goal while minimizing wasted exploration.

MDPs: Utility Over Time

Markov Decision Processes formalize sequential decision making:

States: Where you are
Actions: What you can do
Transitions: Where actions lead (probabilistically)
Rewards: Immediate utility
Policy: Strategy mapping states to actions

Goal: Find a policy that maximizes expected cumulative reward.

This is utility maximization with stochasticity, temporal credit assignment, and exploration-exploitation tradeoffs.

The Bellman equation makes it tractable:

V(s) = max_a [R(s,a) + γ Σ P(s’|s,a) V(s’)]

Optimal value = immediate reward + discounted future value.

Reinforcement Learning: Learning Utility

RL takes it further. You don’t know the transition dynamics or the reward function. You have to explore to discover what states exist, learn which actions lead where, estimate reward structures, and optimize your policy while still learning.

Q-learning is simple and satisfying:

Q(s,a) <- Q(s,a) + α[r + γ max_a’ Q(s’,a’) - Q(s,a)]

Update your estimate of action value based on observed reward plus best future estimate.

This is meta-utility maximization: optimizing a learning process that itself optimizes utility.

Bayesian Networks: Reasoning as Utility

Bayesian networks model belief and inference:

Represent uncertainty via probability distributions
Update beliefs via Bayes’ rule
Make decisions that maximize expected utility given beliefs

Even reasoning becomes utility maximization: given limited computation, how do you allocate inference steps to maximize decision quality?

This connects to bounded rationality. Real intelligence isn’t perfect optimization. It’s good-enough optimization under resource constraints.

The Unifying View

Seeing everything through utility maximization reveals structure:

Search = utility maximization with known, deterministic environments. Planning = utility maximization with known transition models. Reinforcement learning = utility maximization with unknown environments. Supervised learning = utility maximization of prediction accuracy. Unsupervised learning = utility maximization of reconstruction or likelihood.