Skip to main content

Classical Information First: The Coin Under the Cup

This series builds a quantum computer simulator from nothing, and the next post starts it properly with the qubit. This one comes first because a qubit is not a strange new object dropped in from nowhere. It is an ordinary object, the classical bit, with exactly one rule changed. If I build the ordinary object carefully, in code, the quantum one turns out to be almost a one-line edit, and the rest of the series is the story of what that edit buys. So there are no qubits here until the last few lines. There are coins, NumPy arrays, and the probability theory you already half-know, written down until it is explicit.

import numpy as np

The coin under the cup

Take a coin out of your pocket, spin it, catch it, and slap it down on the back of your hand. Before you look, cover it with a cup.

Is it heads or tails?

You do not know. But you do not doubt for a moment that it is one or the other. Under the cup, in the dark, the coin is in some perfectly definite state. It has been since your hand came down. The not-knowing is yours. The coin already made up its mind.

Now bring in a friend. While you look away, they lift the edge of the cup, glance under it, and set it back down without a word. Ask each of you the odds. For you, still fifty-fifty. For your friend, no odds at all: it is heads, plainly, a settled fact. Same coin, same cup, same instant, two different numbers. Nothing about the coin changed when your friend looked. No force reached under the cup. The only thing that changed in the whole room was what one of you knew.

So the fifty-fifty was never a property of the coin. It was a property of you, a fair summary of a mind holding two live guesses with no way yet to choose between them. The probability lived in the gap between the coin and your knowledge of it, and your friend, by looking, closed the gap. What your friend gained has a plain name: information. The coin holds a single piece of it, one yes-or-no, the piece engineers call a bit, and your uncertainty was nothing but the absence of that one piece.

That is the whole starting point, and it is worth holding onto because the quantum world is going to break it: the thing is definite, the odds are about the knower, and information is closing the gap.

A state is a vector of what you know

Everything that follows is just that idea written as arithmetic. Call the two outcomes heads and tails, and give each a slot in a length-two array. Your state of knowledge is a pair of non-negative numbers that sum to one: how much of your belief sits on each outcome.

state = np.array([0.5, 0.5])   # fifty-fifty: I know it is a coin, nothing more
state
array([0.5, 0.5])

When you are certain, the array is one-hot: all the weight on one outcome, none on the other. The friend who peeked is holding one of these.

heads = np.array([1.0, 0.0])
tails = np.array([0.0, 1.0])
heads
array([1., 0.])

There is nothing behind the array. The distribution is not a blurry picture of a hidden number; it is the state of your knowledge, in full. The coin is heads or tails underneath, but what you carry, and the only thing any operation below acts on, is this vector.

Operations are stochastic matrices

If a state is a vector, an operation on it is a matrix, and it acts by matrix multiplication. The simplest is a deterministic flip, the classical NOT, which swaps heads and tails. As a matrix it has a single one in each column.

NOT = np.array([[0.0, 1.0],
                [1.0, 0.0]])
NOT @ heads
array([0., 1.])

An operation can also be random. A fresh fair toss throws away whatever you knew and returns fifty-fifty no matter what you fed in. A worn, biased process might leave the coin mostly as it was and flip it now and then. Both are matrices whose columns are themselves probability vectors: non-negative, each summing to one. These are the stochastic matrices, and they are exactly the operations that send a distribution to a distribution.

fair  = np.array([[0.5, 0.5],
                  [0.5, 0.5]])   # re-randomize: forget the input
drift = np.array([[0.9, 0.1],
                  [0.1, 0.9]])   # mostly keep it, flip 1 in 10

drift @ heads                    # start certain-heads, come out 90/10
array([0.9, 0.1])

The payoff is composition: running one operation after another is multiplying their matrices. That gives you a single matrix for the whole pipeline, and it is still stochastic. Toss the fair coin, then toss it again, and multiplying the matrices confirms what you already expect, that a second fair toss changes nothing.

fair @ fair          # two fair tosses back to back
array([[0.5, 0.5],
       [0.5, 0.5]])

That is the entire mechanics of classical information: states are vectors, certain or not; operations are stochastic matrices; and to chain them you multiply. Hold that in place, because the quantum world keeps every word of it and changes only what the numbers are allowed to be.

The one rule: weights never go negative

There is one constraint underneath all of this that is so obvious it is easy to walk past, and it is the constraint the whole series turns on. The weights in these vectors and matrices are probabilities, so they are never negative. They only ever add.

Watch what that forces. Suppose an outcome can be reached by more than one route, through some intermediate step. The total probability of arriving is the sum of what each route contributes, and every contribution is a non-negative number. Here is a two-stage process, and I will pull out the separate routes to one outcome.

s0     = np.array([1.0, 0.0])   # start certain
A      = np.array([[0.5, 0.5],
                   [0.5, 0.5]])  # stage one: branch
B      = np.array([[0.5, 0.5],
                   [0.5, 0.5]])  # stage two: recombine

mid    = A @ s0                 # distribution after stage one
routes = B[0, :] * mid          # each route's contribution to outcome 0
routes
array([0.25, 0.25])

Each route hands outcome 0 a piece of probability, and the total is just their sum.

float(routes.sum())
0.5

Now the part that has no escape hatch. Because every contribution is at least zero, opening a second route to an outcome can only raise its total, never lower it. More ways to arrive means more chance of arriving.

one_route  = float(routes[0])               # only the first route open
two_routes = float(routes[0] + routes[1])   # both routes open
one_route, two_routes
(0.25, 0.5)

There is no way to arrange a classical coin so that opening a second path to heads makes heads less likely. To make routes work against each other you would need one of them to contribute a negative weight, and a negative weight is not a probability. It is forbidden by the one rule. Probability piles up. It never digs a hole.

The one change that makes it quantum

Here is the whole crossing into the quantum world, and it really is this small. Keep the vectors. Keep the matrices. Keep multiplying to compose. Change only the one rule: let the weights be amplitudes, numbers that may carry a minus sign, and recover a probability only by squaring one at the very end.

That single permission is the entire difference, because now two routes to the same outcome can carry opposite signs, and opposite signs cancel.

0.5 + 0.5            # classical weights: two routes pile up
1.0
(0.5 + -0.5) ** 2    # amplitudes: opposite signs, squared at the end, cancel to nothing
0.0

Two routes to a place, and the chance of arriving drops to zero. That is the impossible-sounding thing a classical coin can never do, and it is what an amplitude does as a matter of course. The cancellation has a name, interference, and everything in this series that sounds strange (superposition, the algorithms, the whole apparent magic of the thing) is this one move and nothing else.

The object that carries the change is the qubit, and building it is the next post. It keeps the length-two array. It keeps the matrix operations. It relaxes exactly the rule we just spent this whole post making explicit, so that you can see precisely what is being given up and what is bought in return. Bring the coin. Bring your certainty about it.

A companion note

This post follows the on-ramp of my book Multitudes: The Indifference of Measure, whose Part I builds the same physics from this same coin under the same cup, in pictures rather than code. If the prose here lands for you, the book is the long version: Amazon and metafunctor.com/writing/multitudes.

The ladder this post climbs, classical states as vectors, then operations as matrices, then the single quantum change, follows John Watrous’s lovely treatment in Basics of Quantum Information. The punchline is his: as small as the change is, it is the whole of the difference.

Watch the Episode

Discussion