A Boolean Algebra Over Trapdoors
This project is available on GitHub.
Boolean Algebra
A Boolean algebra is a mathematical structure that captures the properties of logical operations and sets. Formally, it is defined as a 6-tuple
is a set of elements, ( ) and are binary operations on , ( ) is a unary operation on , and are elements of , often referred to as the minimum and maximum elements, respectively.
These components must satisfy certain axioms, including closure of
Boolean algebras have far-reaching application. They form the foundation of propositional logic and are fundamental to the design of digital circuits and computer architecture [2].
In set theory, a common representation of a Boolean algebra is the power set of a set
, (set intersection), (set union), (set complement), (empty set), (universal set).
This set-theoretic Boolean algebra,
Homomorphisms in Boolean Algebra
A key concept in our exploration of trapdoor Boolean algebras is that of a homomorphism. In abstract algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type. In the context of Boolean algebras, a homomorphism is a function that preserves the operations and special elements of the Boolean algebra.
Given two Boolean algebras
In other words, a homomorphism preserves the structure of the Boolean algebra across the mapping. This preservation of structure is crucial as it allows us to perform operations in one Boolean algebra and have them correspond meaningfully to operations in another Boolean algebra [4].
Homomorphisms play a vital role in our construction of trapdoor Boolean algebras. They allow us to create a mapping between our original Boolean algebra and a new structure that incorporates cryptographic elements, while still maintaining the essential properties of a Boolean algebra. This preservation of structure ensures that operations performed in our trapdoor Boolean algebra still behave in ways that are logically consistent with standard Boolean operations.
In the following sections, we will introduce a specific homomorphism
The Bernoulli Model
Before we introduce our Bernoulli homomorphism, it’s crucial to understand the underlying framework: the Bernoulli model.
The Bernoulli model is a probabilistic framework for representing and reasoning about approximations of values (where a value can be anything from a set, whether the set is something simple like
It introduces a kind of controlled uncertainty into computations, allowing for trade-offs between accuracy and other desirable properties such as space efficiency or security.
Formally, for any type
We may want to discuss the number of ways in which the Bernoulli Model can introduce errors. For a first-order Bernoulli model,
Key properties of the Bernoulli Model include:
Propagation of uncertainty: When operations are performed on Bernoulli approximations, the uncertainties combine in well-defined ways.
Trade-off between accuracy and other properties: By adjusting the probabilities, we can balance accuracy against other desirable characteristics of our system.
Generalization to complex types: The Bernoulli model can be applied to simple types like Boolean values, as well as to more complex types including functions and algebraic structures.
In the context of our work on an approximate Boolean algebra over trapdoors, the Bernoulli Model provides a framework for analyzing and reasoning about the behavior of our cryptographic constructions.
A Boolean Algebra Over Free Semigroups
Consider the Boolean algebra
For example, if
Commonly, for Boolean alebras over finite sets, one approach is to represent sets with bit strings of length
In our case, we have a Boolean albebra over the infinitely large free semigroup
Conceptually, we have an approximate Boolean algebra over the free semigroup
Approximate Boolean Algebra Over Trapdoors
Consider the Boolean alegebra
The operations in
We assume
If we map
These discrepenancies are a consequence of the finite number (
The cryptographic hash function
Homomorphism Properties
We have two Boolean algebras,
To satisfy the properties of a homomorphism, the following properties must hold:
We show that these properties hold for all the properties except the third, which is not satisfied due to the approximate nature of the Boolean algebra over trapdoors. For this reason, we say that
Proof of First Property: We seek to prove the identity
Now, let’s start from the RHS and work our way to the same result:
Proof of Second Property: We seek to prove the identity
Now, let’s start from the RHS and work our way to the same result:
Proof of Fourth Property: We seek to prove the identity
Proof of Fifth Property: We seek to prove the identity
We have proven these properties for the homomorphism
Disproof of Third Property: We seek to disprove the identity
We assume
Now, let’s start from the RHS and work our way to a decidedly different result:
Single-Level Hashing Scheme
In the next section, Two-Level Hashing Scheme
, we will introduce a two-level hashing scheme that reduces the space complexity of the single-level hashing scheme. In this section, we derive the space complexity of the single-level hashing scheme for representing free semigroups as “dense” bit strings of size
Relational Predicates
In this section, we view the Boolean algebra in a set-theoretic context. We then define some predicates that are fundamental to the algebra, namely membership and subset relations.
Set Membership
Set membership, denoted by
For our approximate Boolean algebra
This just means we test for membership by testing that if
When we take the union of two sets, we take the bitwise
Now, let’s consider the union of two singleton sets,
Let us denote the hash of
By De Morgan’s laws,
We can simplify this result using the approximation
The probability that all of the bits are set to 1 is just the product of the probabilities that each of the bits is set to 1:
Since
We use these results to compute the probability of a false positive in the membership and subset relations in the next section.
False Negatives and False Positives
Suppose that we have a set
If
We are interested in computing the probability of this event:
Recall that we assume
For
Asymptotic False Positive Rate
Given the equation
Space Complexity
Let’s rewrite this where we show
We demonstrated that the probability of a false positive is a function of the number of elements

In Figure 1, the false positive rate decreases exponentially as the byte size of the hash increases for smaller sets of elements (k=4 to k=10). In Figure 2, we observe the false positive rate behavior over larger kilobyte sizes for larger sets (k=12 to k=16). The green dashed line represents the 5% false positive rate threshold. As k increases, achieving this threshold requires significantly more space, highlighting the trade-off between set size and hash size.
Subset Relation
The subset relation has a predicate
In our approximate Boolean algebra
Let us repurpose the earlier notation, letting
For
Recall that we may approximate
Since equality can be written as
Two-Level Hashing Scheme
In previous sections, we derived the space complexity of a single-level hashing scheme for
representing free semigroups as “dense” bit strings of size
The two-level hashing scheme is a hierarchical structure that partitions the elements of the set into smaller subsets. For efficiency and compatability with existing hashing algorithms, when we hash an element, we do the following construction:
- We have a hash function that outputs
bits. - The first
bits of the hash are used to determine the subset (bin) to which the element belongs, subsets in total. - The remaining
bits of the hash are used to represent the element within the subset. - We have
bins and bits for each bin, resulting in a total of bits.
This two-level hashing scheme allows us to reduce the number of bits in the hash for a given false positive rate
False Positive Rate
Membership Relation
Assume
- With probability
, it maps to some particular subset. - In that subset, represented by
bits, there are expected to be elements. - We use the earlier derivation to find the probability of a false positive in the subset:
Keeping
The space complexity of the two-level hashing scheme is given by
In Figure 3, we show the false positive rate for different values of
C++ Implementation: Single-Level Hashing Scheme
We can implement a C++ class that models the Boolean algebra over trapdoors.
We generalize the concept of trapdoors to any type
Each trapdoor<X,N>
represents a value of type X
as a hash of size N
. This
value is represented as a bit string of size N
that is hashed using a cryptographic
hash function. It too a Bernolli Model of the equality operator, which returns a Bernoulli
Boolean that represents a Boolean value indicating if the trapdoors are equal with a false positive rate of
We model it as a Boolean algebra over bit strings, where the operations are the bitwise
The trapdoor_set<X,N>
class represents an approximate Boolean algebra over trapdoors (of type X
), as we previously discussed. It is a specialization of trapdoor<X,N>
that implements additional opoerations that can take place over the domain of trapdoor<X,N>
and trapdoor_set<X,N>
:
empty_trapdoor_set<X,N>()
returns an empty set ( ).universal_trapdoor_set<X,N>()
returns a universal set ( ).operator+(trapdoor_set<X,N> const & x, trapdoor_set<X,N> const & y)
returns the union of two sets ( ).operator~(trapdoor_set<X,N> const & x)
returns the complement of a set ( ).operator*(trapdoor_set<X,N> const & x, trapdoor_set<X,N> const & y)
returns the intersection of two sets ( ).empty(trapdoor_set<X,N> const & xs)
returns a Bernoulli Boolean that represents a Boolean value that is true if the set is empty with a false positive rate. Essentially, it is the probability that the hash of the set is zero, , which occurs with a false positive rate .in(trapdoor<X,N> const & x, trapdoor_set<X,N> const & xs)
returns a Bernoulli Boolean that represents a Boolean value that is true if the set contains the element with a false positive rate. See the sectionRelational Predicates
in theSingle-Level Hashing Scheme
section for more details.
The class trapdoor_set
is a template class that takes a type X
and a size N
as template arguments. The class has a value_hash
member that is an array of char
of size N
. The class has a default constructor that initializes the value_hash
to zero. The class has a copy constructor that is defaulted. The class has two static member functions empty_trapdoor_set
and universal_trapdoor_set
that return an empty set and a universal set, respectively. The class has three overloaded operators +
, !
, and *
that implement the union (
For closure, the class has a hash
member function that returns a hash of the set, which is just the hash already stored in it. This means we can compose these operations to form more complex operations, like creating a powerset of trapdoor_set
objects.
#include <array>
template <typename X, size_t N>
struct trapdoor
{
using value_type = X;
/**
* The constructor initializes the value hash to the given value hash.
* Since the hash is a cryptographic hash, the hash is one-way and so we
* cannot recover the original value from the hash.
*
* This also models the concept of an Oblivious Type, where the true value
* is latent and we permit some subset of operations on it. In this case,
* the only operation we permit is the equality and hashing operations.
*
* @param value_hash The value hash.
*/
trapdoor(std::array<char, N> value_hash) : value_hash(value_hash) {}
/**
* The default constructor initializes the value hash to zero. This value
* often denotes a special value of type X, but not necessarily.
*/
trapdoor() { value_hash.fill(0); }
std::array<char, N> value_hash;
};
/**
* The hash function for the trapdoor class. It returns the value hash.
*/
template <typename X, size_t N>
auto hash(trapdoor const & x)
{
return x.value_hash;
}
/**
* Basic equality operator. It returns a Bernoulli Boolean that represents
* a Boolean value indicating if the trapdoors are equal with a false positive rate
* of 2^{-8 N}. Different specializes of the trapdoor class may have different
* false positive rates, but this is a reasonable default value.
*/
template <typename X, size_t N>
auto operator==(trapdoor const & lhs, trapdoor const & rhs) const
{
return bernoulli<bool>{lhs.value_hash == rhs.value_hash, -8*N};
}
/**
* The `or` operation in the Boolean algebra over bit strings.
*/
template <typename X, size_t N>
auto lor(trapdoor lhs, trapdoor const & rhs)
{
for (size_t i = 0; i < N; ++i)
lhs.value_hash[i] |= rhs.value_hash[i];
return lhs;
}
/**
* The `and` operation in the Boolean algebra over bit strings.
*/
template <typename X, size_t N>
auto land(trapdoor lhs, trapdoor const & rhs)
{
for (size_t i = 0; i < N; ++i)
lhs.value_hash[i] &= rhs.value_hash[i];
return lhs;
}
/**
* The `not` operation in the Boolean algebra over bit strings.
*/
template <typename X, size_t N>
auto lnot(trapdoor x)
{
for (size_t i = 0; i < N; ++i)
lhs.value_hash[i] = ~lhs.value_hash[i];
return lhs;
}
template <typename T>
auto minimum() { return std::numeric_limits<T>::min(); }
template <typename T>
auto maximum() { return std::numeric_limits<T>::max(); }
/**
* The `minimum` operation in the Boolean algebra over trapdoors
*/
template <typename trapdoor<typename X, size_t N>>
auto minimum() { return trapdoor<X,N>(); }
/**
* The `maximum` operation in the Boolean algebra over trapdoors
*/
template <typename trapdoor<typename X, size_t N>>
auto maximum()
{
trapdoor<X,N> x;
x.value_hash.fill(std::numeric_limits<char>::min())
return x;
}
/**
* The trapdoor_set class models an approximate Boolean algebra over trapdoors.
* It is a specialization of the trapdoor class that implements additional operations
* that can take place over the domain of trapdoor and trapdoor_set.
*/
template <typename X, size_t N>
struct trapdoor_set: public trapdoor<X,N>
{
trapdoor_set(
double low_k = 0,
double high_k = std::numeric_limits<double>::infinity()),
std::array<char, N> value_hash)
: trapdoor<trapdoor_set<X,N>,N>(value_hash),
low_k(low_k),
high_k(high_k) {}
trapdoor_set() : trapdoor<trapdoor_set<X,N>,N>(), low_k(0), high_k(0) {}
trapdoor_set(trapdoor_set const &) = default;
trapdoor_set & operator=(trapdoor_set const &) = default;
double low_k;
double high_k;
};
/**
* The size (cardinality) of the latent set (the set the trapdoor_set represents).
*
* @param xs The trapdoor_set.
* @return The size range of the set.
*/
template <typename X, size_t N>
auto size(trapdoor_set<X,N> const & xs)
{
return std::make_pair(xs.low_k, xs.high_k);
}
/**
The union operation.
*/
template <typename X, size_t N>
auto operator+(
trapdoor_set<X,N> const & xs,
trapdoor_set<X,N> const & ys)
{
return trapdoor_set<X,N>(lor(xs, ys).value_hash,
std::max(xs.low_k, ys.low_k), xs.high_k + ys.high_k);
}
template <typename X, size_t N>
void insert(trapdoor<X,N> const & x, trapdoor_set<X,N> & xs)
{
for (size_t i = 0; i < N; ++i)
xs.value_hash[i] |= x.value_hash[i];
// since x could already be in xs, we do not increment the low_k
// only the high_k
xs.high_k += 1;
}
template <typename X, size_t N>
auto operator~(trapdoor_set<X,N> x)
{
for (size_t i = 0; i < N; ++i)
x.value_hash[i] = ~x.value_hash[i];
// assume |x| in [a, b]
// then |~x| has the following analysis.
// - if |x| = a, then |~x| = |U| - a, where |U| is universal set
// - if |x| = b, then |~x| = |U| - b
// so, |~x| in [|U|-b, |U|-a]
x.low_k = maximum<X>() - x.high_k;
x.high_k = maximum<X>() - x.low_k;
return x;
}
template <typename X, size_t N>
auto operator*(
trapdoor_set<X,N> const & x,
trapdoor_set<X,N> const & y)
{
return trapdoor_set<X>(x.value_hash & y.value_hash,
0, // intersection could be empty
std::min(x.high_k, y.high_k) // one could be a subset of the other
);
}
template <typename X, size_t N>
bernoulli<bool> in(trapdoor<X,N> const & x, trapdoor_set<X,N> const & xs
{
for (size_t i = 0; i < N; ++i)
if ((x.value_hash[i] & xs.value_hash[i]) != x.value_hash[i])
return bernoulli<bool>{true, 0};
// earlier, we showed that the fp rate is (1 - 2^{-k})^n
return bernoulli<bool>{false, };
}
/**
* The subseteq_B predicate, x \subseteq_B y, is defined as x | y = x.
* It has an false positive rate of eps = (1 - 2^{-(k_1 + k_2 log 2)})^n
* which we approxiate as eps ~ e^{-n e^{-(k_1 + k_2 log 2)}}.
* we store the log(eps) instead: -n e^{-(k_1 + k_2 log 2)}
*/
template <typename X>
bernoulli<bool> operator<=(
trapdoor_set<X> const & x,
trapdoor_set<X> const & y)
{
for (size_t i = 0; i < N; ++i)
if ((x.value_hash[i] & y.value_hash[i]) != x.value_hash[i])
{
return bernoulli<bool>(false, 0); // false negative rate is 0
}
// earlier, we showed that the fp rate is (1 - 2^{-(k_1 + k_2 log 2)})^n
return bernoulli<bool>{true, };
}
template <typename X>
bernoulli<bool> operator==(
trapdoor_set<X> const & x
trapdoor_set<X> const & y)
{
for (size_t i = 0; i < N; ++i)
if (x.value_hash[i] != y.value_hash[i])
return bernoulli<bool>{false, 0.5};
}
Appendix
Marginal Uniformity
Suppose that we have a probability distribution over elements (unigrams) in
When we do a membership query, we uniformly sample one of these representations so that the unigram distribution of elements in
However, this approach has serious shortcomings:
Only the marginal distribution of unigrams are uniformly distributed. Correlations in the joint distributions of the free semigroup
, such as bigrams, are not accounted for. We can apply this transformation to larger sequences, but the space complexity grows exponentially with the length of the sequence for a fixed false positive rate. In practice, for large , even the unigram model may need to be approximated due to space limitations.When we apply the Boolean operations on an untrusted system, it cannot be given the information about the distribution of the elements in
. This means that Boolean operations like and cannot be performed on the untrusted system, only relational querieslike and .