Kraft's Inequality
Which codeword-length vectors are achievable by prefix-free codes? Kraft's inequality is the answer.
A pedagogical series exploring information theory by construction in C++23
This series develops information theory the same way the Stepanov series develops algorithms: by construction, with minimal pedagogical code, and with the algebraic structure made explicit.
The Stepanov series argued that the algebraic structure of a type determines its algorithms. This companion series argues the analogous claim for bit-level encodings: the algebraic structure of a code determines its compression efficiency, its decoding cost, and its compositional behavior. Each universal code corresponds to a different prior over the integers; each entropy-optimal code (Huffman, arithmetic) is best under different assumptions; each succinct data structure is the right answer under different access patterns.
A prefix-free code assigns codeword lengths $(l_1, \ldots, l_n)$ to symbols. Kraft’s inequality characterizes which length vectors are achievable:
$$\sum_i 2^{-l_i} \leq 1.$$Within this budget, every code is a different way of spending it. Allocating more bits to symbol $i$ means less budget for the others. Information theory’s optimum (assigning $-\log_2 p_i$ bits to a symbol with probability $p_i$) saturates Kraft when the probabilities are dyadic.
This series walks through the codes that have shown up in practice, treating each one as a different hypothesis about the integer distribution it expects to encode. The choices are not arbitrary: each code is optimal somewhere, and recognizing where reframes “compression algorithm” as “model selection.”
The code in this series is pedagogical. The production version (with full STL integration, 31k+ test assertions, and the rich combinator library these posts only sketch) lives in PFC.
The Stepanov series develops algorithms from algebra on type. The bridge posts (Bits Follow Types, When Lists Become Bits) connect that series to this one.
Which codeword-length vectors are achievable by prefix-free codes? Kraft's inequality is the answer.
Any length vector satisfying Kraft has a prefix-free code. Here is the construction.
Every prefix-free code is a hypothesis about the source. The codeword lengths determine an implicit probability distribution; the code is optimal when that prior matches the true source.
Unary and Elias gamma are the two simplest universal codes. Unary encodes n in n bits; gamma in 2 log2(n)+1 bits. Each implies a different prior over the integers.
Elias delta and omega extend Elias gamma by recursively encoding the length prefix. Each step yields shorter codewords for large integers at a small constant cost for small ones.
Fibonacci coding uses Zeckendorf's representation to produce self-synchronizing codewords. Every codeword ends in two consecutive ones; a single bit flip corrupts at most two codewords.
Rice and Golomb codes are parametric: a single parameter k (or m) tunes the code to a specific geometric distribution. Choosing k is choosing your prior precisely.
VByte trades bit-level precision for byte-alignment, and that trade wins in practice. Most production columnar databases and network protocols use VByte for integer encoding.
Given a finite distribution, Huffman's algorithm builds the prefix-free code with minimum expected length. It is the first entropy-optimal code in this series.
Arithmetic coding closes the gap between Huffman's per-symbol integer lengths and true entropy. A single number in the unit interval encodes an entire sequence; 32-bit integer arithmetic makes it practical.