Bits Follow Types
Codecs are not ad-hoc bit formats. They are constructions on the algebraic structure of types.
Browse posts by tag
Codecs are not ad-hoc bit formats. They are constructions on the algebraic structure of types.
Prefix-freeness is the property that lifts the free-monoid construction into bit space.
I asked an AI to analyze 140+ repos and 50+ papers as a dataset. The unifying thesis it found: compositional abstractions for computing under ignorance.
Arithmetic coding closes the gap between Huffman's per-symbol integer lengths and true entropy. A single number in the unit interval encodes an entire sequence; 32-bit integer arithmetic makes it practical.
Solomonoff induction, MDL, speed priors, and neural networks are all special cases of one Bayesian framework with four knobs.
Given a finite distribution, Huffman's algorithm builds the prefix-free code with minimum expected length. It is the first entropy-optimal code in this series.
Cryptographic theory assumes random oracles with infinite output. We have 256 bits. This paper explores how we bridge that gap, and what it means that we can.
VByte trades bit-level precision for byte-alignment, and that trade wins in practice. Most production columnar databases and network protocols use VByte for integer encoding.
Entropy maps use prefix-free hash codes to approximate functions without storing the domain, achieving information-theoretic space bounds with controllable error.
Entropy maps use prefix-free hash codes to approximate functions without storing the domain, achieving information-theoretic space bounds with controllable error.
Space bounds, entropy requirements, and cryptographic security properties of perfect hash functions.
Rice and Golomb codes are parametric: a single parameter k (or m) tunes the code to a specific geometric distribution. Choosing k is choosing your prior precisely.
Fibonacci coding uses Zeckendorf's representation to produce self-synchronizing codewords. Every codeword ends in two consecutive ones; a single bit flip corrupts at most two codewords.
Elias delta and omega extend Elias gamma by recursively encoding the length prefix. Each step yields shorter codewords for large integers at a small constant cost for small ones.
Unary and Elias gamma are the two simplest universal codes. Unary encodes n in n bits; gamma in 2 log2(n)+1 bits. Each implies a different prior over the integers.
Every prefix-free code is a hypothesis about the source. The codeword lengths determine an implicit probability distribution; the code is optimal when that prior matches the true source.
Any length vector satisfying Kraft has a prefix-free code. Here is the construction.
Which codeword-length vectors are achievable by prefix-free codes? Kraft's inequality is the answer.
What if containers wasted zero bits? A C++ library for packing arbitrary value types at the bit level using pluggable codecs.
The problem of predicting what comes next, from compression to language models