PFC: Zero-Copy Data Compression Through Prefix-Free Codecs
A header-only C++20 library that achieves 3-10x compression with zero marshaling overhead using prefix-free codes and Stepanov-style generic programming.
Browse posts by tag
A header-only C++20 library that achieves 3-10x compression with zero marshaling overhead using prefix-free codes and Stepanov-style generic programming.
VByte trades bit-level precision for byte-alignment, and that trade wins in practice. Most production columnar databases and network protocols use VByte for integer encoding.
Rice and Golomb codes are parametric: a single parameter k (or m) tunes the code to a specific geometric distribution. Choosing k is choosing your prior precisely.
Fibonacci coding uses Zeckendorf's representation to produce self-synchronizing codewords. Every codeword ends in two consecutive ones; a single bit flip corrupts at most two codewords.
Elias delta and omega extend Elias gamma by recursively encoding the length prefix. Each step yields shorter codewords for large integers at a small constant cost for small ones.
Unary and Elias gamma are the two simplest universal codes. Unary encodes n in n bits; gamma in 2 log2(n)+1 bits. Each implies a different prior over the integers.
Every prefix-free code is a hypothesis about the source. The codeword lengths determine an implicit probability distribution; the code is optimal when that prior matches the true source.