The Bernoulli Model
Random approximate computation: the Bernoulli model, its papers, and the monograph that unifies them. Bloom filters are the special case; this is the general theory.
The one-sentence version. A Bernoulli set is a set you can store far below the information-theoretic cost of the exact set, in exchange for a known, tunable rate of false positives and false negatives. The Bernoulli model is the framework that makes that trade precise and then carries it everywhere: from sets to maps, relations, and arbitrary algebraic types, with closed-form error propagation through every operation.
A Bloom filter is the special case everyone already knows: false positives, no false negatives, one of the two error rates pinned to zero. The model generalizes it. Both error rates are first-class and tunable, the construction is provably space-optimal at the information bound, and the same algebra that governs a single membership test governs unions, intersections, joins, function composition, and the Boolean post-processing of noisy measurements.
The monograph: the center of gravity
Everything below is being drawn together into one book.
Bernoulli Types: An Algebra of Approximate Computation builds the model from a single atom, the Bernoulli Boolean, and composes outward: random approximate sets, their entropy and space-optimal construction, maps, relations, and a full algebra of approximate data types. Each paper below is roughly a chapter’s worth of the argument. The monograph is where they become one coherent theory instead of a stack of separate preprints, and it is the thing I most want read. It is in active development.
Read it: the monograph as a PDF (276 pages).
The papers
A family of preprints, each a self-contained result the monograph consolidates. All carry minted Zenodo DOIs.
Foundations
| Paper | What it establishes |
|---|---|
| Bernoulli Sets: A Model for Random Approximate Sets | The model itself: the two axioms (element-wise independence and conditional block independence), the false-positive / false-negative parameterization, the error-count distributions, and higher-order composition. The hub the rest build on. |
| Composition of Bernoulli Sets | Closed-form error propagation through union, intersection, complement, and difference, the monoid structure, and the relational subset and equality predicates. |
| Binary Classification Measures for Random Approximate Sets | PPV, NPV, accuracy, and Youden’s J as random variables, with prevalence sensitivity by the delta method. |
| Entropy and Space Complexity of Random Approximate Sets | The information-theoretic lower bound on bits per element and the space-accuracy duality that every optimal construction must meet. |
| The Bernoulli Model: A Probabilistic Framework for Data Structures and Types | The type-theoretic generalization: Bool as the atom, approximate sum and product types, and the computational basis from which the rest is built. |
Optimal constructions
| Paper | What it establishes |
|---|---|
| The Bernoulli Hash Function: Optimal Bernoulli Sets and Bernoulli Maps | The space-optimal construction that meets the entropy bound: the threshold predicate, generalized acceptance, and a unified set/map framework. |
| The Perfect Hash Filter | A concrete, efficient implementation of the positive Bernoulli set and map via perfect hashing, with a ranked-search application. |
Extensions
| Paper | What it adds |
|---|---|
| The Algebra of the Random Approximate Map Model | Functions from X to Y as first-class approximate objects, and the error rates of map composition. |
| The Abstract Data Type of the Approximate Relation | Relational algebra (selection, projection, join) over approximate relations, and a closure dichotomy for structural invariants: a class-induced invariant is free, transitivity admits no bounded closure. |
Applications
| Paper | What it adds |
|---|---|
| A Forward (ε, ω) Calculus for Quantum Readout Error | The model as the error calculus for the classical post-processing of quantum measurement, validated on real device calibration. Under review at Quantum Science and Technology. |
The software
bernoulli-types is the reference implementation in Python: the atom, the sets, the maps, and the sketches, built outward in phases that mirror the monograph’s parts, with optional plain-C acceleration and pure-Python fallbacks. It is both the research artifact and the book’s executable companion. Source.
Notes and posts
Shorter writing on the model, tagged bernoulli-model:
- A Boolean Algebra Over Trapdoors · June 2023A Boolean algebra framework over trapdoors for cryptographic operations. Introduces a homomorphism from powerset Boolean algebra to n-bit strings via cryptographic hash functions, enabling secure …
Connection to trapdoor computing
The Bernoulli model is also the error model for Trapdoor Computing. At the mathematical level a Bernoulli map is a cipher map: the construction’s salt is the trapdoor key, the acceptance predicate implements the cipher map, and the space formula \(-\log_2(\varepsilon) + \mu\) governs both. The composition theorem \(\eta_{\text{total}} = 1 - \prod_i (1 - \eta_i)\) is the same law on both sides. If you came here from the cryptography, that is the bridge.
Alexander Towell, metafunctor.com.