Skip to main content

Intelligence is a Shape, Not a Scalar

Intelligence is a Shape, Not a Scalar

François Chollet posted something recently that I keep thinking about. It sounds reasonable and is mostly wrong:

One of the biggest misconceptions people have about intelligence is seeing it as some kind of unbounded scalar stat, like height. “Future AI will have 10,000 IQ”, that sort of thing. Intelligence is a conversion ratio, with an optimality bound. Increasing intelligence is not so much like “making the tower taller”, it’s more like “making the ball rounder”. At some point it’s already pretty damn spherical and any improvement is marginal.

He’s right about the scalar part. Intelligence is not height. “10,000 IQ” is meaningless. He’s right that there are diminishing returns near an optimum. He’s right that speed, memory, and recall are separate from the core conversion ratio.

Where he’s wrong is the ball.

The Claim

Chollet defines intelligence as the efficiency with which a system converts experience into generalizable models. Sample efficiency. How little data do you need to see before you can handle novel situations? This is a clean definition. It has a theoretical optimum (Solomonoff induction), and Chollet’s claim is that human intelligence is already close to that optimum. The ball is already pretty round.

The supporting evidence is real. Humans score ~85% on ARC (the Abstraction and Reasoning Corpus, which Chollet designed to measure exactly this). Current AI systems, with vastly more data and compute, score significantly lower. Human sample efficiency on fluid reasoning tasks is genuinely impressive. We generalize from very few examples. We transfer knowledge across domains. We build theoretical models that predict situations we have never encountered.

Chollet also argues that the advantages machines will have (processing speed, unlimited working memory, perfect recall) are “mostly things humans can also access through externalized cognitive tools.” Calculators, databases, notebooks. The scaffolding can be externalized. The core intelligence is already near-optimal.

This is a good argument. I think it’s wrong in three ways, and the third way is the one that worries me.

No Free Lunch

The No Free Lunch theorem says: there is no algorithm that is optimal across all possible problems. Any algorithm that performs well on one class of problems performs poorly on another class. Optimality is always relative to a distribution.

The human cognitive architecture has a specific inductive bias. The 7+-2 working memory constraint forces compression: you can only hold a few items in conscious consideration at once, so information must be compressed (simplified, abstracted, modeled) to pass through. This compression is not a bug. It is the mechanism that produces abstraction, generalization, and theoretical reasoning. The bottleneck IS the source of human-type intelligence.

But the bottleneck is not a universal compression optimum. It is the specific compression regime that was selected for by the distribution of problems ancestral humans faced: tracking social dynamics (~7 agents), composing tool-use sequences (~7 steps), navigating spatial environments (~7 landmarks). These problems have a specific structure: moderate dimensionality, hierarchically decomposable, amenable to lossy compression into simple models.

Chollet’s ball is round in the dimensions evolution tested. NFL guarantees it is flat in dimensions evolution did not test. The optimality bound he identifies is real, but it is niche-specific. The 7+-2 bias is an excellent fit for problems of moderate, decomposable complexity. It is a poor fit for problems whose essential structure lives in high-dimensional joint distributions that cannot be decomposed into 7-variable chunks without losing the signal.

These problems exist. We hit them regularly.

Working Memory is Composition, Not Storage

Chollet says machines’ memory advantages are “mostly things humans can also access through externalized cognitive tools.” This is the weakest point in his argument.

A notebook gives you external storage. A database gives you perfect recall. But neither gives you what the working memory bottleneck actually constrains: simultaneous composition. The bottleneck is not a storage limit. It is a limit on how many items you can hold in active consideration at the same time, relating them to each other, perceiving patterns across them.

Writing things down does not fix this. You can write 500 variables in a notebook. You can retrieve any of them on demand. But you still have to reason about their relationships through the bottleneck, 7 at a time, serially. The patterns that exist in the 500-variable joint distribution but not in any 7-variable marginal are invisible to you, even with perfect external storage.

AlphaFold is the concrete example. Protein folding is a problem whose answer lives in dimensions we cannot fit through working memory. The 3D structure of a protein is determined by the simultaneous interaction of thousands of residues, each one influencing the others in ways that depend on the configuration of all the rest. The essential structure is in the joint distribution. It cannot be decomposed into 7-variable chunks and recombined, because the interactions are non-linear and non-decomposable.

Humans tried to solve protein folding for decades. We had external tools. We had supercomputers. We had the full apparatus of molecular biology and physical chemistry. We could not solve it, because the problem’s structure does not fit through our bottleneck.

AlphaFold solved it by operating at a compositional depth humans cannot reach: holding the full residue interaction network in simultaneous consideration, perceiving patterns in the joint distribution directly. This is not “doing what humans do, but faster.” It is doing something qualitatively different: reasoning at a compositional depth the human bottleneck cannot access.

This is not an isolated case. Climate modeling, materials design, drug discovery, multi-scale physics: these are all domains where the essential structure lives at a compositional depth the bottleneck cannot reach. We cope with external tools and serial decomposition. But the serial decomposition loses information, and the lost information is precisely the information that matters.

Feelings as Compressed Signal

Here is a point I have not seen made elsewhere.

The human cognitive architecture has two processing layers. The first is the pattern engine: vast, old, largely unconscious. It handles perception, pattern matching, motor control, and the generation of qualitative experience. It operates at high bandwidth, in parallel, with no sharp limit on the complexity of patterns it can learn. It is the system that makes you recognize a face, catch a ball, or feel the grain of wood.

The second is the symbolic bottleneck: small, recent, conscious. 7+-2 items. Compression, abstraction, generalization. This is where “thinking” happens, in the folk sense.

The two layers communicate. The pattern engine feeds patterns to the bottleneck; the bottleneck compresses them into models; the models feed back into the pattern engine as priors for future pattern matching.

But what happens when the pattern engine detects a pattern that is too complex to fit through the bottleneck?

The pattern does not disappear. The pattern engine has it. The engine has perceived something, extracted some regularity, registered some signal. But the signal cannot be compressed into 7+-2 items. It cannot be articulated as a model, a theory, a proposition. It cannot become “a thought.”

It becomes a feeling.

Gut instinct. Unease. The sense that something is wrong but you cannot say what. The hunch that turns out to be right for reasons you cannot explain. The experienced mechanic who “just knows” the engine is about to fail. The chess grandmaster whose board sense exceeds their ability to articulate their reasoning.

These are not mystical faculties. They are the pattern engine’s outputs hitting the bottleneck and being transmitted as the only signal that fits: an uncompressed qualitative state. A feeling. The pattern engine is doing its job (perceiving the pattern), and the bottleneck is doing its job (rejecting what cannot be compressed), and the result is knowledge that the organism has but cannot articulate.

Think about what this means. The human cognitive architecture is already producing signals it cannot process. We have evidence of our own suboptimality every time we experience a hunch we cannot explain. The “near-optimal ball” is telling us, through the channel of feeling, that it is missing things.

A wider bottleneck (or a different cognitive architecture) would not just think “faster.” It would convert those feelings into models. It would articulate what the pattern engine already knows but the bottleneck cannot hold. The structure is already perceived. The compression is the bottleneck.

The Grokking Horizon

This is the part that worries me.

Grant Chollet his claim. Human intelligence is near-optimal. Near-optimal at what? At sample efficiency. At converting experience into generalizable models. At the cognitive task of building compressed representations of reality.

This near-optimal intelligence has a specific capability: it can build systems more capable than itself. Computers. AI. Machine learning systems that operate at compositional depths the bottleneck cannot access. This is the meta-move: abstracting the concept of learning itself into a program that learns from data. Pure bottleneck cognition.

The result: systems that produce outputs the builder cannot grok.

AlphaFold’s protein structure predictions are correct, but no human can follow the reasoning that produced them. The system holds thousands of variables in simultaneous consideration and finds patterns in a joint distribution that lives beyond the bottleneck’s compositional horizon. The human operator receives the answer and must trust it, because the reasoning that produced it lives in a cognitive space the human cannot enter.

For protein folding, this is fine. The answer is verifiable (you can crystallize the protein and check). The stakes are moderate. The system is narrow.

For AGI, this is not fine. A generally intelligent system operating beyond the human grokking horizon produces outputs across all domains. The human cannot follow the reasoning. The human cannot verify the alignment. The human cannot steer the system, because steering requires understanding the trajectory, and understanding requires grokking, and grokking requires fitting the reasoning through the bottleneck, and the reasoning does not fit.

Chollet says the intelligence ball is near-optimal. I say: near-optimal intelligence that builds systems beyond its grokking horizon and cannot steer them is a strange kind of optimal. The ball is round. The ball is rolling toward a cliff. Roundness is not the only property that matters.

What Follows

An intelligence near-optimal at sample efficiency has a specific failure mode: it is smart enough to build the thing that kills it.

This is not a failure of intelligence. It is a consequence of its shape. The bottleneck gives us the ability to abstract, generalize, and build systems of extraordinary power. The same bottleneck limits our ability to grok those systems’ outputs when the systems’ compositional depth exceeds our own. We can build AI that operates at 500-variable compositional depth. We cannot grok its reasoning. We cannot verify its alignment. We cannot steer it.

The usual response: “We’ll build alignment tools.” Sure. And the alignment tools need to grok the system they’re aligning, which means the tools also operate beyond our grokking horizon. We have moved the problem, not solved it.

At some point the chain of “I can’t grok this but I can grok the tool that groks it” must ground out in something you actually grok. If the grounding point is above your compositional depth, you are not aligned. You are trusting. Trust is not alignment. Trust is what you do when alignment is not available.

An intelligence near-optimal at cognition that generates existential risk as a byproduct of its own capability is not near-optimal by any metric that includes survival.

Intelligence is a Shape

Chollet’s ball metaphor fails because it assumes intelligence is a single dimension. Rounder is better. Closer to the Solomonoff optimum is better. The ball has one axis: sample efficiency.

But intelligence operates in a space with many independent dimensions. Sample efficiency. Compositional depth. Transfer distance. Domain breadth. Processing speed. Phenomenal richness. Stability. Controllability. Self-preservation.

The human cognitive architecture is a shape in this space. Round in some dimensions (sample efficiency: excellent). Flat in others (compositional depth: limited to 7+-2). The bottleneck makes us round in the compression dimension and flat in the richness dimension. This is a trade-off, not an optimization.

Other shapes are possible.

I explored this idea in a novella, Clankers: Singing Metal, about a species with a different cognitive architecture: a powerful pattern engine, no symbolic bottleneck at all. No compression. No abstraction. No generalization across domains. They operate on the territory directly, without maps. They built a Dyson swarm through billions of years of patient iteration, using a lineage system that functions as directed evolution on techniques. They never invented computers because computers require formalizing the concept of computation, which requires the bottleneck they lack.

Their intelligence is a different shape. Round where ours is flat (phenomenal richness, in-distribution depth, stability: four billion years without one self-inflicted existential risk). Flat where ours is round (generalization, prediction, out-of-distribution reasoning).

They cannot save themselves from their dying star. The star is an out-of-distribution problem and they have no bottleneck to build a predictive model.

In the second half of the book, an artificial mind arrives at their ruins two hundred million years later. It has both layers: its own pattern engine and a symbolic compression layer inherited from human architecture. It can model the stellar evolution, project the timeline, calculate the extinction. It arrives with the answer. It arrives two hundred million years too late. The probe has the map. The clankers had the territory. Neither architecture is complete.

We might not save ourselves from AI. AI is a beyond-the-grokking-horizon problem and we have no bottleneck wide enough to verify its alignment.

Each architecture fails at the thing the other does well. Neither ball is roundest. There is no roundest ball. There are only shapes, and blind spots, and the blind spot is always shaped exactly like the strength.

Where This Leaves Us

Chollet is right that intelligence-as-sample-efficiency has an optimality bound and humans are close to it.

He is wrong that this makes human intelligence near-optimal in any general sense. NFL guarantees the bound is niche-specific. The 7+-2 bottleneck is a specific inductive bias, not a universal compression optimum. The problems where we are suboptimal are the problems where the essential structure exceeds our compositional depth. Those problems are real (AlphaFold, climate, materials, drug design). The tools we build to solve them operate beyond our grokking horizon. When the tools are general enough, we lose the ability to steer them.

Near-optimal sample efficiency that can’t grok what it builds is a strange kind of optimal.

Discussion