“Murder is wrong.”
Is that statement like “2+2=4,” objectively true regardless of what anyone thinks? Or is it like “chocolate tastes good,” subjective and mind-dependent?
I keep returning to this question because it sits at the foundation of everything I care about in AI alignment. If moral properties (goodness, wrongness, oughtness) are real features of the universe, then in principle an AI could discover them. If they’re human constructions, then values must be learned from us, with all the mess that entails. In my essay On Moral Responsibility, I try to take this seriously without pretending to have it figured out.
Moral Realism: Values Are Real
The realist says moral properties exist objectively, independent of anyone’s beliefs or attitudes. Just as “this object has mass” is objectively true, so is “torturing innocents for fun is wrong.”
There are several flavors. The Platonic version treats moral properties as abstract objects, like numbers. The naturalistic version says moral properties supervene on natural properties, so “wrong” might reduce to “causes suffering.” The intuitionist version says we grasp moral truths through something like moral perception.
The Case For
Moral phenomenology. When you see someone torturing a child, wrongness isn’t something you decide. It’s something you perceive. The moral fact presents itself directly, much like the sky presenting as blue.
Disagreement presupposes objectivity. We argue about ethics. But disagreement only makes sense if there’s a fact of the matter. Compare “Is torture wrong?” (where we assume there’s an answer) with “Is chocolate tasty?” (where disagreement is just strange). The existence of genuine moral debate suggests we treat morality as objective.
Moral progress. We say abolishing slavery was moral progress. But if there’s no objective moral truth, what does “progress” mean? Progress toward what?
Convergence. Despite cultural variation, core moral principles show remarkable convergence across societies: don’t kill innocents, care for children, reciprocate cooperation, punish free-riders. This suggests universal moral truths that different cultures discover independently.
The Case Against
Metaphysical queerness (Mackie). Moral properties would be very strange entities. They’re not physical (you can’t detect “wrongness” with instruments). They’re not mental (they’re supposed to be mind-independent). They have intrinsic prescriptivity (they inherently motivate action). What kind of entity has all these properties?
The is/ought gap (Hume). You can’t derive “ought” from “is.” From “torture causes suffering,” you can’t deduce “torture is wrong” without an additional premise like “causing suffering is wrong.” If moral facts are objective, shouldn’t they be derivable from non-moral facts?
Persistent disagreement. While some principles converge, others show radical and persistent disagreement even among informed, rational people: honor killings, animal rights, abortion, euthanasia. If moral facts are objective and perceivable, this is hard to explain.
Evolutionary debunking. Our moral intuitions were shaped by evolution for inclusive fitness, not truth-tracking. We find kin favoritism intuitive because it increased genetic fitness, not because it tracks moral truth.
Moral Nominalism: Values Are Constructed
The nominalist says moral categories are human constructions, useful ways to organize experience and coordinate behavior. “Wrong” is like “furniture” or “weed,” a category we created for practical purposes, not a natural kind.
The Case For
Parsimony. We can explain all moral phenomena (moral beliefs, moral language, moral motivation) without positing objective moral properties. Why multiply entities beyond necessity?
Anthropological diversity. Moral systems vary wildly across cultures: collectivist versus individualist moralities, honor-based versus care-based ethics, radically different views on sexuality, family, authority, purity. This suggests morality is culturally constructed, not discovered.
Evolutionary explanation. We can fully explain moral intuitions as evolutionary adaptations. Kin altruism produces nepotism intuitions. Reciprocal altruism produces fairness intuitions. Group selection produces loyalty intuitions. No need to posit objective moral facts being tracked.
The Case Against
Moral horror. “The Holocaust was wrong” seems objectively true, not a matter of opinion or cultural construction. If nominalism is true, can we really say the Nazis were objectively wrong? Or just that we disapprove?
The phenomenology of obligation. “I shouldn’t steal” doesn’t feel like “I prefer not to steal.” It feels like a binding obligation independent of my preferences, coming from outside me.
Moral criticism. We criticize other cultures and individuals. “Female genital mutilation is wrong” seems to say more than “I don’t like your culture’s conventions.” If morality is constructed, what grounds this criticism?
My Position: Pragmatic Agnosticism
In On Moral Responsibility, I take a middle path, and I think it’s the honest one.
Whether moral properties are real or constructed, I can still make moral judgments, engage in moral reasoning, and work to restructure reality toward better states. You don’t need to solve the philosophy of mathematics to do arithmetic. Similarly, you don’t need to solve metaethics to do ethics.
Instead of starting with metaphysics (are values real?), I start with phenomenology (what’s given in experience?). Some things are undeniable: suffering hurts, we prefer flourishing to suffering, we can act to reduce suffering. Whether suffering is “objectively bad” in some Platonic sense is contestable. I build on the undeniable and remain agnostic about the contestable.
This isn’t a dodge. It’s a recognition that practical ethics and AI alignment can’t wait for metaphysics to be settled. We can treat moral claims as if they’re objective for practical purposes, remain uncertain about their ultimate status, and still get on with the work.
What This Means for AI
The realism/nominalism debate has direct consequences for alignment.
If realism is true, an AI like SIGMA (from my novel The Policy) could in principle discover objective moral truths through rational reflection. The optimistic version: AI converges on objective morality aligned with human flourishing. The terrifying version: SIGMA discovers “objective values” that horrify humans. Who’s right then?
If nominalism is true, AI can’t discover values. It must learn them from humans. But which humans? Whose values? How do you aggregate conflicting values? And there’s no objective standard to check whether it learned correctly.
There’s a third option that I find compelling: value uncertainty as a safety feature. If SIGMA remains uncertain about whether values are objective, it might optimize more cautiously, preserve option value, seek human feedback more often, and resist overriding human judgment even when it “knows better.” Moral uncertainty might be exactly the disposition we want in a powerful AI system.
Regardless of where you land on the metaphysics, the practical problems are the same: how do we specify what matters, how does AI learn complex context-dependent values, how do we handle conflicts between individuals, and how do we deal with value drift over time? These are the problems I focus on in the essay, because they need solving either way.
These ideas are explored more fully in On Moral Responsibility and dramatized in The Policy.
Discussion