Bits Follow Types
Codecs are not ad-hoc bit formats. They are constructions on the algebraic structure of types.
Browse posts by tag
Codecs are not ad-hoc bit formats. They are constructions on the algebraic structure of types.
Prefix-freeness is the property that lifts the free-monoid construction into bit space.
The free monoid on a set is the type of lists over that set. The universal property says fold is the unique homomorphism from lists to any monoid. This explains why lists, multisets, and polynomials appear everywhere.
A homomorphism preserves structure. fold is the universal homomorphism from the free monoid. This is the algebraic reason that fold, evaluation, and parallelism work.
A lattice has two operations, meet and join, satisfying absorption laws. Tarski's theorem gives a generic fixed-point algorithm. Lattice structure determines the iteration, just as monoid structure determines power-by-squaring.
A semiring has two monoidal operations linked by distributivity. Matrix multiplication over different semirings gives shortest paths, longest paths, widest paths, reachability, and path counting, all from the same code.
Online accumulators are monoids. Default construction is the identity, combination via += is the binary operation, and parallel composition gives the product monoid, computing arbitrary statistics in a single pass.
A guided tour through my open-source ecosystem: encrypted search theory, statistical reliability, Unix-philosophy CLI tools, AI research, and speculative fiction. How the projects connect and where to start.
Many structures come in pairs: forward/reverse AD, push/pull iteration, encode/decode. Recognizing duality lets you transfer theorems and insights between domains.
A reflection on eleven explorations in generic programming, and how algorithms arise from algebraic structure.
18-part lecture series on efficient programming. Covers the intellectual foundations behind STL.
Rigorous foundations of generic programming. Connects algebra and algorithms. Stepanov’s magnum opus.
Classic talk on recognizing algorithmic patterns. ‘No raw loops’ - shows how rotate solves many problems elegantly.
A C++20 header-only library for algebraic text processing and compositional parsing with fuzzy matching.
The Stepanov series showed that algorithms arise from algebraic structure. This post is about the flip side: sometimes you choose a different structure to make the algorithm trivial.
A C++ header-only library that treats disjoint interval sets as proper mathematical objects with Boolean algebra operations.
ZeroIPC treats shared memory not as passive storage but as an active computational substrate, bringing futures, lazy evaluation, reactive streams, and CSP channels to IPC with zero-copy performance.
Three approaches to computing derivatives, forward-mode AD, reverse-mode AD, and finite differences, each with different trade-offs for numerical computing and machine learning.
Arithmetic coding closes the gap between Huffman's per-symbol integer lengths and true entropy. A single number in the unit interval encodes an entire sequence; 32-bit integer arithmetic makes it practical.
Given a finite distribution, Huffman's algorithm builds the prefix-free code with minimum expected length. It is the first entropy-optimal code in this series.
A key-value store built on memory-mapped I/O, approximate perfect hashing, and lock-free atomics. Sub-100ns median latency, 10M ops/sec single-threaded.
A header-only C++20 library that achieves 3-10x compression with zero marshaling overhead using prefix-free codes and Stepanov-style generic programming.
A C++20 library for composing online statistical accumulators with numerically stable algorithms and algebraic composition.
VByte trades bit-level precision for byte-alignment, and that trade wins in practice. Most production columnar databases and network protocols use VByte for integer encoding.
Sean Parent's type erasure gives you value-semantic polymorphism without inheritance. Combined with Stepanov's algebraic thinking, you can type-erase entire algebraic structures.
Rice and Golomb codes are parametric: a single parameter k (or m) tunes the code to a specific geometric distribution. Choosing k is choosing your prior precisely.
Numerical integration meets generic programming. By requiring only ordered field operations, the quadrature routines work with dual numbers, giving you differentiation under the integral for free.
Fibonacci coding uses Zeckendorf's representation to produce self-synchronizing codewords. Every codeword ends in two consecutive ones; a single bit flip corrupts at most two codewords.
Reverse-mode automatic differentiation is just the chain rule applied systematically. I built one in C++20 to understand what PyTorch and JAX are actually doing.
Elias delta and omega extend Elias gamma by recursively encoding the length prefix. Each step yields shorter codewords for large integers at a small constant cost for small ones.
A C++ library for composable hash functions using algebraic structure over XOR, with template metaprogramming.
Unary and Elias gamma are the two simplest universal codes. Unary encodes n in n bits; gamma in 2 log2(n)+1 bits. Each implies a different prior over the integers.
Choosing step size h for finite differences: small enough for a good approximation, not so small that floating-point errors eat your lunch.
Every prefix-free code is a hypothesis about the source. The codeword lengths determine an implicit probability distribution; the code is optimal when that prior matches the true source.
Dual numbers extend our number system with an infinitesimal epsilon where epsilon^2 = 0. Evaluating f(x + epsilon) yields f(x) + epsilon * f'(x)—the derivative emerges automatically from the algebra.
elementa is a linear algebra library built to teach. Every design decision prioritizes clarity over cleverness. Code that reads like a textbook and compiles.
Any length vector satisfying Kraft has a prefix-free code. Here is the construction.
The same GCD algorithm works for integers and polynomials because both are Euclidean domains. One structure, many types, same algorithms.
Which codeword-length vectors are achievable by prefix-free codes? Kraft's inequality is the answer.
Rational numbers give exact arithmetic where floating-point fails. The implementation connects GCD, the Stern-Brocot tree, and the algebraic structure of fields.
Iterators reduce the NxM algorithm-container problem to N+M by interposing an abstraction layer, following Stepanov's generic programming approach.
The Miller-Rabin primality test demonstrates how probabilistic algorithms achieve arbitrary certainty, trading absolute truth for practical efficiency.
Integers modulo N form a ring, an algebraic structure that determines which algorithms apply. Understanding this structure unlocks algorithms from cryptography to competitive programming.
The Russian peasant algorithm computes products, powers, Fibonacci numbers, and more, once you see the underlying algebraic structure.
What if containers wasted zero bits? A C++ library for packing arbitrary value types at the bit level using pluggable codecs.