Overview
A unified C++ framework for probabilistic data structures based on the fundamental distinction between latent (true but unobservable) values and their observed (noisy but measurable) approximations. By making approximation a first-class concept at the type level, Bernoulli types enable formal reasoning about error propagation in probabilistic algorithms.
Key Concepts
Latent vs. Observed
The framework distinguishes:
- Latent values (
S
,f
,x
): True mathematical objects we care about - Observed values (
~S
,~f
,~x
): Noisy approximations we compute with
// Latent set S - exists mathematically
Set<int> S = {1, 2, 3, 4, 5};
// Observed set ~S - Bloom filter approximation
bloom_filter<int> obs_S(1000, 3);
obs_S.insert(1);
obs_S.insert(2);
// Query may have false positives
if (obs_S.contains(6)) { // Might be true even though 6 ∉ S
// Handle approximate result
}
Examples in Practice
- Bloom filters: Observe latent sets through membership queries
- Count-Min sketches: Observe latent frequency distributions
- MinHash: Observe latent set similarities
- Miller-Rabin: Observe latent primality property
Mathematical Foundation
Type System
- Type erasure: Store different implementations uniformly
- Expression templates: Lazy evaluation of set operations
- Error propagation: Track uncertainty through computations
- Interval arithmetic: Handle uncertain error rates
Operations
Set-theoretic operations preserve Bernoulli properties:
- Union:
~S1 ∪ ~S2
with error rate composition - Intersection:
~S1 ∩ ~S2
with compounded approximation - Difference:
~S1 \ ~S2
with modified error characteristics
Features
Data Structures
- Bloom filters - Space-efficient set membership
- Count-Min sketch - Frequency estimation
- MinHash - Set similarity
- Cuckoo filters - Deletable Bloom filters
Algorithms
- Space-optimal constructions - Achieve information-theoretic bounds
- Adaptive error rates - Adjust parameters dynamically
- Majority voting - Improve accuracy with multiple observations
Example
#include <bernoulli/observed_bool.hpp>
#include <bernoulli/bloom_filter.hpp>
// Observed boolean with 10% error rate
observed_bool obs_true(true, 0.1);
// Bloom filter as observed set
bloom_filter<std::string> filter(10000, 4);
filter.insert("hello");
filter.insert("world");
// Membership test with explicit error
auto fpr = filter.false_positive_rate();
std::cout << "FPR: [" << fpr.min << ", " << fpr.max << "]" << std::endl;
Research Papers
The repository includes LaTeX source for seven research papers:
- Foundations of Bernoulli Types - Core framework and mathematical foundations
- Bernoulli Sets and Boolean Algebra - Set-theoretic operations and error propagation
- Universal Bernoulli Maps - Function approximation and composition
- Regular Types and Bernoulli Types - Type algebra and limitations
- Applications to Search and Retrieval - Information retrieval applications
- Entropy Maps Implementation - Space-optimal implementations
- Statistical Analysis - Distributions and confidence intervals
Applications
- Privacy-preserving data structures (oblivious types)
- Space-efficient approximate algorithms
- Distributed systems with probabilistic guarantees
- Stream processing with bounded memory
- Encrypted search with controlled leakage
Resources
- PDF: Bernoulli Types Book
- Code: GitHub Repository
- Documentation: Theory Guide
Related Work
- Bernoulli Data Types - Implementation framework
- Cipher Trapdoor Sets - Cryptographic applications
- Encrypted Search - Privacy-preserving search systems