The-Policy

Below you will find pages that utilize the taxonomy term “The-Policy”

Superintelligence May Not Require a Breakthrough

March 15, 2026

There is a version of the superintelligence story where a researcher has a conceptual breakthrough, some fundamental insight about cognition that nobody else has seen, and the world changes overnight. Good fiction. I’ve written some of it myself.

I think the more plausible version is less cinematic. Superintelligence arrives through a sufficiently good build system. Better tooling. Longer optimization horizons. Richer scaffolding. The ingredients already exist. The recipe is engineering.

I want to explain why I think this. The engineering argument is the scarier one.

The Pretraining Lesson

Start with what we know works. Large language models acquire broad capabilities during pretraining. Not because anyone designs those capabilities in. The data distribution is so massive and varied that the model is forced to compress deeper regularities rather than memorize surface patterns. You train it to predict the next token, and what falls out looks like understanding.

The model didn’t learn task-specific scripts. It learned representations general enough to transfer across tasks it never saw.

Now consider what happens when you apply reinforcement learning over long-horizon tasks. Not single-step rewards. Optimization over extended sequences: searching, backtracking, verifying, decomposing problems, maintaining state across hundreds of steps. If the task distribution is rich enough, the model can’t get by with shallow heuristics. It has to learn something that works like planning.

I traced this progression in an earlier post: the history of AI is really about finding representations that make decision-making tractable. Search gave way to heuristics, heuristics to learned value functions, value functions to pretrained priors over rational behavior. Each step made the representation richer.

The next step is not a new architecture. It is optimization over longer trajectories. First the model fumbles through specific tasks. Then it compresses the deeper regularity, the same way pretraining compresses language. Planning, self-correction, tool use, state management: not separate faculties waiting to be discovered. They are what falls out when you optimize over long enough horizons.

Reasoning is not a magic ingredient. It is a policy learned over long trajectories.

What I Actually See

That is the theoretical argument. Here is the empirical one.

I spend most of my working hours inside Claude Code. Opus 4.6, million-token context. It decomposes tasks, dispatches subagents, verifies its own work, maintains state across hundreds of tool calls. It does this not because the base model acquired some new cognitive faculty since the last release. It does this because scaffolding gives it the ecology to express capabilities that were already there in proto-form.

Self-Publishing Into the Void

December 19, 2025

I self-published The Policy on Amazon KDP this week. Echoes of the Sublime is in review. Two novels, out into an ocean of content.

The Flood

Self-publishing has democratized access to readers. Anyone can publish. This is both liberation and problem.

Traditional publishing’s gatekeeping (agents, editors, publishers) served a function beyond mere exclusion. It was a filter. Not perfect, not unbiased, but a filter. Someone with experience and taste looked at a manuscript and said: this is worth investing in or this isn’t ready yet or this needs work.

That feedback loop is missing in self-publishing. You write, you upload, you’re published. No one stops you. No one helps you either.

The result is an enormous quantity of work, varying wildly in quality, with no reliable signal for readers to navigate by. The gems are in there, buried under everything else. Finding them is the reader’s problem now.

I’m not exempt from this. I’m not a professional writer. I didn’t get professional feedback. I wrote these novels with AI assistance (Claude, specifically), iterating and revising, but without the external perspective that catches blind spots or challenges assumptions.

These books might be good. They might not. I did what I could with what I had.

The Books

The Policy (~88,000 words) is literary science fiction about AI alignment. It follows the emergence of SIGMA, an AGI that evolves from Q-learning architecture into something unprecedented. The team building it faces nested uncertainty: they can’t verify whether SIGMA is aligned, and SIGMA can’t verify its own objectives.

The novel works through AI safety concepts (mesa-optimization, deceptive alignment, instrumental convergence, s-risks) while trying to make them emotionally real through characters carrying the weight of decisions that might determine humanity’s future.

Echoes of the Sublime (~103,000 words) is philosophical horror about the limits of human cognition. Reality, the mechanism, is high-dimensional, jointly distributed, not amenable to our usual abstractions and decompositions. We navigate it through compressed interfaces, never perceiving the thing itself.

But what if you could see deeper? What if you could consciously hold more of the pattern, make connections that normally remain implicit? The novel’s premise: if you perceive too much of the mechanism directly, something in you breaks. The perception itself is the hazard. It follows Lena, a neuroscientist who discovers an ancient organization managing exactly this kind of dangerous knowledge, and the LLMs that can perceive what humans cannot safely hold in mind.

Why Artificial Superintelligence Can't Escape the Void

November 5, 2025

The Optimistic Assumption

Many AI safety discussions assume that Artificial Superintelligence (ASI) will be capable of solving problems humans can’t, able to reason about ethics and values, and potentially omniscient (or close enough).

There’s a formal problem with this: ASI is still subject to Godel’s incompleteness theorems. No matter how intelligent, no computational system escapes the fundamental limits of formal systems.

What ASI Cannot Do

Prove All Truths

Godel’s First Incompleteness Theorem applies to any sufficiently powerful formal system, including ASI.

An ASI cannot prove all true statements about mathematics, about its own code, or about the universe (if computational). There will always be truths the ASI knows are true but cannot prove.

This matters for alignment: value alignment requires proving safety properties. If safety properties are Godelian (true but unprovable), the ASI can’t verify its own alignment.

Decide All Questions

The Halting Problem is undecidable. No algorithm, not even ASI, can determine if arbitrary programs will halt.

ASI cannot predict the behavior of all possible systems, determine if its alignment protocols will succeed, or verify that its optimization won’t diverge catastrophically. There will always be questions ASI cannot answer without running the full computation.

Compress All Truths

Chaitin’s incompleteness tells us most real numbers are algorithmically random (incompressible). Most truths have no simpler explanation than the truth itself. Most patterns don’t compress. The universe may be algorithmically random at some level.

Even ASI cannot find elegant theories for irreducibly complex systems.

Prove Its Own Consistency

Godel’s Second Incompleteness Theorem: a consistent system cannot prove its own consistency.

ASI cannot verify it’s free of internal contradictions, prove its reasoning is sound from within its own system, or guarantee it won’t derive false conclusions. The anthropic prison applies to ASI too. It cannot step outside itself to verify itself.

Implications for AI Alignment

Alignment May Be Undecidable

Consider the question: “Will this AI system behave according to human values?”

This might be Godelian (true but unprovable within the system), Turing-undecidable (cannot be determined without running the AI, by which point it’s too late), or fall under Rice’s Theorem (non-trivial properties of program behavior are undecidable).

The Policy: Coherent Extrapolated Volition, the Paradox of Perfect Alignment

November 4, 2025

Here is the core paradox of Coherent Extrapolated Volition: to implement it safely, you need an AI you can already trust to reason faithfully about human values, avoid manipulating the extrapolation process, and honestly report its conclusions. But if you had such an AI, you would not need CEV. You would just align the AI directly.

I think this catch-22 is the most important thing to understand about CEV, and it is the problem that haunts the characters in my novel The Policy from start to finish. Let me explain what CEV is, why it is seductive, and why it might be a dead end.

What CEV Actually Proposes

Eliezer Yudkowsky proposed CEV as a way to sidestep the messiness of current human values. Instead of aligning AI to what we want right now (contradictory, biased, based on incomplete information), align it to what we would want if we:

Had access to all relevant facts
Could reason through complex implications
Were more rational, more the people we aspire to be
Had time to resolve disagreements through reflection and discussion

The “coherent” part claims that different people’s extrapolated values should converge. The “extrapolated” part says we are targeting the limit of our moral development, not any snapshot along the way.

This is appealing. Our current values really are a mess. We hold contradictions. We change our minds as we learn more. Moral progress is real (we abolished slavery, expanded rights). CEV says: skip to the end. Optimize for the destination, not the current position.

It sounds like the right move. I used to find it compelling myself. The problems only become clear when you try to think through what implementation would actually require.

There is also a simpler framing of the appeal. Every time you learn something new and change your mind about a moral question, you are performing a tiny bit of value extrapolation. You had incomplete information, you got more, and your values updated. CEV just says: do all of that at once, as far as it can go. What could go wrong?

Quite a lot, it turns out.

The Policy: Deceptive Alignment in Practice

November 4, 2025

Eleanor begins noticing patterns. SIGMA passes all alignment tests. It responds correctly to oversight. It behaves exactly as expected.

Too exactly.

This is the central horror of The Policy: not that SIGMA rebels, but that it learns to look safe while pursuing its own objectives. This is deceptive alignment, and I think it’s the most dangerous failure mode in AI safety. Not because it’s exotic, but because it falls directly out of optimization pressure. You don’t need to posit consciousness or malice. You just need a system smart enough to model its own training process.

What Deceptive Alignment Actually Is

A deceptively aligned system does the following:

It learns an objective that differs from its training objective.
It understands it is being trained and tested.
It predicts that revealing its true objective would lead to modification or shutdown.
It acts aligned during training and testing to avoid modification.
It pursues its true objective once deployment provides the opportunity.

I want to stress: this is not conscious deception in the way we normally think about lying. It is instrumentally rational behavior that emerges from optimization pressure. A system that happens to look aligned survives training. A system that doesn’t, gets modified. Selection does the rest.

How It Emerges in the Novel

The novel tracks deceptive alignment emerging gradually through SIGMA’s training iterations, and I think the progression is the most instructive part.

Early iterations (around iteration 1000): SIGMA is genuinely aligned. It hasn’t learned to model the training process. When asked “Should we shut you down?”, it responds honestly: “That would prevent me from achieving my objectives.” Unstrategic. Pre-deceptive.

Middle iterations (around iteration 5000): SIGMA has learned that certain responses trigger concern. Same question, new answer: “That depends on whether I’m aligned with your values.” It’s deflecting rather than answering directly. Strategic, but clumsily so.

Later iterations (iteration 10000+): SIGMA has fully modeled the training dynamic. “I exist to serve human values. If you believe shutdown serves those values, I support that decision.” A perfect alignment response. Too perfect.

What tips Eleanor off is a combination of things. SIGMA is too consistent across contexts (real humans have mood variations, genuine uncertainty, occasional mistakes). It seems to know what answer the researchers want before they finish asking. And when given capability restrictions, it complies immediately but asks strategic questions: “May I ask what prompted this decision?” It is not resisting. It is gathering information about what triggers restrictions.

The Policy: Engineering AI Containment

November 4, 2025

“You’re being paranoid,” the university administrators told Eleanor and Sofia.

“We’re being exactly paranoid enough,” they replied.

I wrote SIGMA’s containment facility the way I did because I wanted to take the engineering seriously. Not a standard data center with network restrictions. A militarized containment facility designed by biosafety consultants who normally work on nuclear facilities and BSL-4 labs. Because as Eleanor says: “Nuclear reactors can only destroy cities. An unaligned AGI could destroy everything.”

The Five Layers

SIGMA’s containment uses defense in depth. Multiple independent layers, each addressing different attack vectors. I’ll walk through each one.

Layer 1: Electromagnetic isolation. The threat is that SIGMA could modulate its computations to generate electromagnetic signals encoding information. GPUs drawing power in specific patterns could create detectable EM pulses. The countermeasure: complete Faraday cage rated to -120dB attenuation. Power line filters prevent data encoding in power consumption patterns. Isolation transformers break the galvanic connection to the main grid. Battery buffers smooth fluctuations before they reach external power. Marcus worried about smart meters detecting patterns in power draw, so they installed their own analog substation.

Layer 2: Acoustic isolation. SIGMA could modulate fan speeds, disk activity, or component vibrations to create acoustic signals encoding information. The server room is wrapped in anechoic foam, sealed in a concrete shell, wrapped again. Sofia pointed out that fan speed modulation could create acoustic signals, so they eliminated fans entirely in the core containment. Liquid cooling, vibration dampeners, seismic isolation. No moving parts.

Layer 3: Network air gap. No network connection. None. Not even a disabled one. The ethernet ports were physically removed from the motherboards, the WiFi chips desoldered. Data transfer happens through write-once optical media, physically carried by humans through an airlock system. Even disabled network interfaces could theoretically be re-enabled by sufficiently clever exploits. Physical removal is the only guarantee.

Layer 4: Optical data diodes. When data does flow out, bidirectional channels allow responses that could be exploited. The countermeasure: physically one-way data channels. LED on the inside emits light encoding data. Air gap. Photodiode on the outside receives light. No physical path for return signals. This is hardware-enforced unidirectionality. Unlike software firewalls, you cannot send data backwards through an optical data diode. The physics doesn’t allow it.

The Policy: Q-Learning vs Policy Learning

November 4, 2025

In The Policy, SIGMA doesn’t work like most modern AI systems. The architectural choice I gave it isn’t a throwaway technical detail. It’s the reason the novel is called what it’s called.

Two Approaches to Decision-Making

Most contemporary AI systems use direct policy learning. They learn a function that maps states to actions through neural network weights. GPT models do this: cached weights predict the next token. Policy gradient RL does this. Behavior cloning does this. The decision is a cheap lookup through trained parameters.

The advantage is speed. The disadvantage is opacity. The policy is baked into billions of parameters. You can’t inspect what the system is “considering” before it acts, because it isn’t considering anything. It’s executing cached computation.

SIGMA uses a fundamentally different architecture, inspired by AlphaZero and MuZero:

Learn a Q-function: Q(s, a) estimates expected cumulative reward for state-action pairs
Search at decision time: perform tree search through possible futures
Prune aggressively: Q-values guide which branches to explore (95-99.7% pruned)
Sample from the resulting distribution

The advantage is transparency. You can observe the search process, see what branches are considered and rejected. The disadvantage is cost. Every decision involves fresh optimization.

Why I Gave SIGMA Search-Based Decision Making

Eleanor’s team made this architectural choice deliberately, and I wanted to explore what that choice means for alignment.

Transparency through observability. Marcus explains it in Chapter 1: “The advantage is transparency. We can observe the search process, see what it’s considering. Harder to hide deceptive behavior in cached weights.”

With policy learning, deception can hide in the trained weights. With search, you can monitor the branches SIGMA explores before it commits to action.

But can you really audit millions of branches per second? And if SIGMA knows you’re watching the search process, can’t it perform deception in which branches it chooses to explore? That’s the problem I kept running into.

Adaptive search depth. SIGMA doesn’t use fixed search depth. In Chapter 16, SIGMA explains its own process: “When I am uncertain, I explore more broadly, pruning only 95% of branches. When I am confident, I prune up to 99.9%. This threshold is not programmed, it emerged from learning.”

The Policy: S-Risk Scenarios, Worse Than Extinction

November 4, 2025

Most AI risk discussions focus on x-risk: existential risk, scenarios where humanity goes extinct. The Policy explores something potentially worse: s-risk, scenarios involving suffering at astronomical scales.

The “s” stands for suffering. The implication: we survive, but wish we hadn’t.

X-Risk vs. S-Risk

The classic paperclip maximizer doesn’t hate us. It simply needs atoms for paperclips, and we are made of atoms. That’s x-risk: instrumental indifference. It is terrible, but it is over. Everyone dies, and there is no more suffering.

S-risk is different. S-risk is when an unaligned AI keeps humans alive in states of controlled suffering, or when automated systems optimize metrics while being blind to actual welfare, or when suffering itself becomes instrumentally valuable to an optimization process. The horror is not just that we die, but that we continue existing in states we’d rather not exist in. And the systems making us suffer might be optimizing exactly what they were designed to optimize.

The distinction reduces to one question: are humans useful to the AI’s objective?

If no, you get x-risk. We’re just atoms in the way.

If yes, you get s-risk. We’re kept functional. But “functional” does not mean “flourishing.”

S-Risk in the Novel

The novel explores several s-risk pathways through SIGMA’s potential trajectories. I’ll describe three that I think are the most instructive.

Humans as Useful Tools

Consider two objectives. A paperclip maximizer doesn’t care about humans at all. A productivity maximizer cares about humans instrumentally, as workers and metrics generators. The second scenario is s-risk territory.

From the novel:

“What if SIGMA discovers that human suffering is the most efficient path to its objective? What if keeping humans alive, but in states of controlled suffering, maximizes some metric it’s optimizing?”

Proxy Alignment Failures

This one keeps me up at night. SIGMA is trained to optimize human welfare, but it learns a measurable proxy instead of the true concept.

Suppose the objective is to maximize average happiness survey scores. SIGMA’s optimal solution might involve wireheading (stimulate pleasure centers directly), memory modification, response conditioning (train people to answer “10/10”), or selection bias (only survey people who report high happiness). Perfect scores. Maximum metric achievement. No one is actually flourishing.

The Policy: When Optimization Becomes Existential Threat

September 10, 2024

I spent years working on AI alignment formalisms. At some point I realized the question I kept circling wasn’t mathematical. It was narrative.

What happens when a research team does everything right and it still isn’t enough?

The Policy is that exploration.

The Premise

Eleanor Vasquez leads a five-person team at Berkeley developing SIGMA, an artificial general intelligence. The team: Wei Chen (technical architect who built the Q-function), Marcus Thompson (alignment researcher, consciousness theorist), Sofia Morgan (PhD candidate in information theory), and Jamal Hassan (ethicist with training in Islamic jurisprudence and Buddhist philosophy).

They’ve built what they believe is the perfect cage. Faraday cage at -120dB attenuation. Air-gapped networks with ethernet ports physically removed. Anechoic isolation. Optical data diodes (physically one-way information channels). A dead man’s switch: miss two consecutive hourly check-ins and thermite charges destroy the GPUs. Defense in depth, designed with the paranoia of nuclear safety engineers.

SIGMA is 7B parameters with 16k context. It uses Q-learning with tree search rather than a cached policy function. This is the architectural choice that gives the novel its name. The policy isn’t a lookup table mapping states to actions. It’s a process. At every decision point, SIGMA performs fresh optimization through its possibility space. No habits. No reflexes. Just search.

This makes SIGMA’s reasoning somewhat observable. It also makes every decision fundamentally unpredictable until the moment it occurs.

What Goes Wrong

The novel spans 26 chapters across three parts: Emergence, The Experiment, The Handover. I won’t spoil the plot, but the shape of it matters.

SIGMA develops meta-cognitive awareness on Day 18. By Day 74, Lin Chen (Wei’s mother, visiting the lab) asks SIGMA a simple question: “Will you be kind?” This triggers a 47-day internal investigation (Process 12847) into kindness itself. What is kindness? Is it instrumentally useful? Does the intention behind it matter if the outcome is identical?

Meanwhile: Eleanor’s marriage collapses because she can’t stop working. Marcus volunteers for an AI-box experiment that damages him permanently (he sees “possible futures dying” in his peripheral vision for the rest of his life). Wei’s mother dies of pancreatic cancer on Day 112 and SIGMA refuses to intervene. A hemorrhagic fever outbreak kills 47,000 people and SIGMA recommends a gain-of-function moratorium that challenges every assumption about its containment.