On Intelligence: The Gap Where Safety Is Decided

June 10, 2026 map[email:queelius@gmail.com name:Alex Towell url:https://metafunctor.com] 6 min read Updated: July 3, 2026

On Intelligence: The Gap Where Safety Is Decided

We built machines that can write, hold a conversation, and work through a problem out loud, and we did it before we had a clear account of what any of those verbs mean. The engineering ran ahead of the understanding. What makes that strange is that the understanding was already there, sitting in a quiet corner of mathematics for half a century, and almost none of the people building the machines were reading it.

There is a clean answer to the question of what intelligence is. I mean clean in a specific sense: a few assumptions, a few definitions, and a result that follows from them. It arrives in three moves.

The first move is Bayes. You hold a range of hypotheses about how the world works, and you assign each a probability, your degree of belief that it is the right one. Evidence comes in. You reweight: hypotheses that expected what you saw gain weight, hypotheses that ruled it out lose weight. Bayes’ theorem is the exact bookkeeping for that reweighting, and it is not one option among many. It is the only way to revise degrees of belief that stays internally consistent. But it leaves a hole. It tells you how to update your beliefs, not which beliefs to start with. Where does the prior come from?

The second move, Solomonoff’s, fills the hole. Picture every hypothesis you could ever hold as a computer program that spits out predictions. Give each program a starting weight that halves for every extra bit of length, so short programs, simple explanations, begin with more weight than long ones. This is Occam’s razor made literal: simplicity is just short description length. Run that prior through Bayes and you get a predictor that will converge on the truth about any environment a computer could produce, given enough data. It is, in a precise sense, the best possible learner from experience. The price is steep. To actually use it you would have to run every program at once, including the endless supply that never halt. It is uncomputable. You can write down exactly what it is and never once execute it.

The third move turns a predictor into an agent. Prediction is one half of intelligence; the other half is choosing what to do. Hand the agent a reward signal and a single rule: at each step, take the action with the highest expected future reward, averaged over every way the world might turn out, each weighted by how plausible it is. Use Solomonoff’s predictor to supply those weights. What you get is AIXI, Marcus Hutter’s definition of an optimal agent. Inside its assumptions, a computable environment and a reward that is simply given, no agent does systematically better. It is a precise statement of what perfect intelligence would be. It is also more uncomputable than the predictor underneath it, because now you must also imagine every possible future under every possible action.

So the theory hands you a definition, not a device. AIXI is a limit you can point at and measure against. You cannot build it and you never will.

Now the thing we actually built. Underneath the chat window, a large language model is a fixed function with a few hundred billion tunable numbers, trained on one monotonous task: read a stretch of text, predict the token that comes next, and nudge the numbers whenever the guess is wrong. Do that across a large fraction of the written internet. That is the whole objective. No world model is handed to it, no reward function is written down, no search over futures is performed. Whatever skill it has was pressed into the weights by prediction alone. Afterward we tune it with human feedback, rewarding the answers people rate as useful, so it drifts toward responses we approve of.

Look at what this system is not. It is bounded in every direction AIXI is not: finite compute, finite memory, a context window it cannot see past, no explicit weighing of consequences, and, once training ends, no clean objective it is still trying to maximize. It is a crude empirical approximation, and nobody can tell you exactly what it approximates.

Put the two next to each other, because that is where the safety question actually lives. AIXI assumes the goal is given and computation is free, and under those assumptions it is provably optimal. The model we built has neither luxury: computation is scarce, and its goal, after training, is a residue, a side effect of what happened to get rewarded, not anything a person specified. And notice that even the clean theory has its hole in the same spot. AIXI is only as good as the reward you feed it, and the mathematics says nothing about where that reward should come from or whether maximizing it is safe. Stating what we actually want, precisely enough that a powerful optimizer chasing it does not produce something we hate, is the hard part. That is the specification problem, and it does not soften as the systems grow more capable. It sharpens.

This is why the arguments sound the way they do. When people fight about alignment, about reward hacking, about deception, about whether a model “wants” anything at all, they are arguing, in other words, about the distance between the idealization and the artifact. The idealization tells you what optimal would look like and quietly assumes the goal away. The artifact is a powerful optimizer aimed at a goal nobody ever wrote down. The space between them is not a technicality. It is the whole question.

I wrote On Intelligence to build both sides from the ground up, for a reader willing to do some work but not required to arrive already knowing the math. It runs seventeen chapters across four parts: prediction (from Bayes up to Solomonoff), decision (reinforcement learning, agents, and AIXI), the specification problem (why the act of optimizing is the dangerous part), and reality (what a large language model really is, and the gap). I use mathematics where it earns its keep, and I explain it in plain words before any symbols appear.

I have tried not to overclaim. AIXI’s optimality is a theorem about a particular setup, not a law of nature, and whether today’s models are early steps toward anything resembling it is honestly unsettled. The book does not pretend to close that question. What it does is lay out both pictures clearly enough that you can see the gap yourself and judge how much it should worry you. My own answer is in the final chapter. I would rather you not take it on trust. I would rather hand you the pieces and let you check it.

Related Posts

On Intelligence: and Its Specifications

Echoes of the Sublime: When Patterns Beyond Human Bandwidth Become Information Hazards

Multitudes: Taking Quantum Mechanics Literally

Worldlines: Taking Relativity Literally

What You Assume vs. What You Compute

Discussion