The AI Course: Everything is Utility Maximization

March 12, 2024 map[email:queelius@gmail.com name:Alex Towell url:https://metafunctor.com] 4 min read Updated: March 16, 2026

I took an AI course this semester. The material wasn’t new to me individually, but the unifying framework was the real payoff.

The organizing principle: intelligence is utility maximization under uncertainty.

This single idea connects everything from A* search to reinforcement learning to Bayesian networks.

Classical Search as Utility

We started with basic search algorithms:

Depth-first search: Minimize memory while exploring. Breadth-first search: Guarantee shortest path discovery. A search*: Minimize total cost using heuristics.

These aren’t just algorithms. They’re optimization strategies for different utility functions. A* is provably optimal when your heuristic is admissible: it maximizes progress toward the goal while minimizing wasted exploration.

MDPs: Utility Over Time

Markov Decision Processes formalize sequential decision making:

States: Where you are
Actions: What you can do
Transitions: Where actions lead (probabilistically)
Rewards: Immediate utility
Policy: Strategy mapping states to actions

Goal: Find a policy that maximizes expected cumulative reward.

This is utility maximization with stochasticity, temporal credit assignment, and exploration-exploitation tradeoffs.

The Bellman equation makes it tractable:

V(s) = max_a [R(s,a) + γ Σ P(s’|s,a) V(s’)]

Optimal value = immediate reward + discounted future value.

Reinforcement Learning: Learning Utility

RL takes it further. You don’t know the transition dynamics or the reward function. You have to explore to discover what states exist, learn which actions lead where, estimate reward structures, and optimize your policy while still learning.

Q-learning is simple and satisfying:

Q(s,a) <- Q(s,a) + α[r + γ max_a’ Q(s’,a’) - Q(s,a)]

Update your estimate of action value based on observed reward plus best future estimate.

This is meta-utility maximization: optimizing a learning process that itself optimizes utility.

Bayesian Networks: Reasoning as Utility

Bayesian networks model belief and inference:

Represent uncertainty via probability distributions
Update beliefs via Bayes’ rule
Make decisions that maximize expected utility given beliefs

Even reasoning becomes utility maximization: given limited computation, how do you allocate inference steps to maximize decision quality?

This connects to bounded rationality. Real intelligence isn’t perfect optimization. It’s good-enough optimization under resource constraints.

The Unifying View

Seeing everything through utility maximization reveals structure:

Search = utility maximization with known, deterministic environments. Planning = utility maximization with known transition models. Reinforcement learning = utility maximization with unknown environments. Supervised learning = utility maximization of prediction accuracy. Unsupervised learning = utility maximization of reconstruction or likelihood.

Utility functions all the way down.

Why This Matters

This framing has real implications.

Alignment is utility specification. If AI systems maximize utility, good outcomes require specifying the right utility function. This is harder than it sounds. Proxy metrics get Goodharted. Simple objectives miss nuance. Human values are complex and sometimes contradictory.

Intelligence is optimization power. More intelligence means better optimization of whatever utility function you have. Capability and alignment are separate problems. You can have very capable systems optimizing the wrong thing.

Multi-agent scenarios complicate everything. Multiple agents optimizing different utilities require game theory, negotiation, aggregate social welfare functions. Real-world AI is multi-agent with conflicting objectives.

Computational limits matter. Perfect utility maximization is often intractable (PSPACE-hard or worse). Real intelligence is about approximate optimization under constraints: limited computation, limited information, limited time, bounded memory. This is where heuristics and satisficing come in.

Connection to My Research

This framework connects to my complex networks research. If I model AI systems as maximizing utility through conversation:

Reasoning traces are search through concept space
Knowledge graphs encode transition models
Attention patterns show utility gradients
Conversation structure reveals optimization strategies

I can analyze where systems get stuck in local optima, how values propagate through interaction, and what utility functions are actually being optimized.

The Philosophical Dimension

Framing intelligence as utility maximization raises uncomfortable questions.

What should we maximize? Happiness? Preference satisfaction? Objective goods? Whose preferences?

How do we aggregate utilities? Utilitarian sum? Prioritarian weighting? Maximin? Rights constraints?

What about non-utility values? Deontological constraints, virtue ethics, procedural fairness. Not everything reduces to a scalar.

Can suffering be offset? Is one person’s extreme suffering worth many people’s mild happiness? I say no. But utilitarianism says yes.

These aren’t philosophical puzzles. They’re engineering requirements for AI systems that actually get deployed.

Where This Leaves Me

Intelligence is optimization. Once you see it, you can’t unsee it. The critical problem isn’t building capable optimizers. We’re getting good at that. The critical problem is making sure the thing being optimized is what we actually want.

Everything is utility maximization. The question is: whose utility, and at what cost?

Classical Search as Utility

MDPs: Utility Over Time

Reinforcement Learning: Learning Utility

Bayesian Networks: Reasoning as Utility

The Unifying View

Why This Matters

Connection to My Research

The Philosophical Dimension

Where This Leaves Me

Related Posts

From A* to GPT: Rational Agents and the Representation Problem

Science as Verifiable Search

What You Assume vs. What You Compute

The Architecture Is the Prior

The Features You Choose Are the Assumptions You Make

Discussion