Entropy Maps

February 18, 2024

The PDF version of this post is available on GitHub.

An entropy map approximates a function $f : \mathcal{X} \to \mathcal{Y}$ by hashing domain values to prefix-free codes in the codomain. We store nothing about the domain itself. We just hash, and a prefix of that hash serves as a code for a codomain value.

We allow multiple codes per codomain value. For instance, the value a might be encoded by 00, 01, 10, and 11. If the hash is less than 4, we decode it as a.

Suppose $\Pr\lbrace f(X) = y\rbrace = p_y$ where $X \sim p_X$. The optimally space-efficient code, assuming a uniform hash function $h$, assigns prefix-free codes for $y$ whose probability of being selected by $h$ sums to $p_y$. The expected bit length is

$$ \ell = -\sum_{y \in \mathcal{Y}} p_y \log_2 p_y, $$

which is the entropy of the output distribution. That is why we call it an entropy map.

If $\mathcal{X}$ is finite, we can think of it as implicitly encoding the domain and storing the prefix-free code for each domain element. The average bit length per element is $\ell$, and the total is $|X| \ell$.

Rate distortion: Bernoulli maps

We can allow errors. If one codomain value $y’$ is very common (say $p_{y’} > .99$), we can give it a prefix-free code that covers probability $p_{y’}$ and then skip coding for it in the entropy map. A random $x \in \mathcal{X}$ will map to $y’$ with probability $p_{y’}$ (which can be made as close to 1 as desired by trading space for accuracy). For the remaining domain values, we code them correctly, or allow errors on those too after attempting correct coding.

Bernoulli set-indicator function

Consider a set-indicator function

$$ 1_{\mathcal{A}} : \mathcal{X} \to \lbrace0,1\rbrace, $$

where $\mathcal{A} \subseteq \mathcal{X}$ and $\mathcal{X}$ is very large (possibly infinite). We assign prefix-free codes for codomain value $1$ such that a random hash maps an element of $\mathcal{X}$ to a code for $1$ with probability $\varepsilon$, where $\varepsilon$ is small (say $2^{-10}$).

There exists a (countably infinite) set of hash functions that hash all elements in $\mathcal{A}$ to codes for $1$ and elements in $\mathcal{A}’ = \mathcal{X} \setminus \mathcal{A}$ to codes for either $0$ or $1$. Choosing a random hash function with this property, we expect $\varepsilon$ of the elements in $\mathcal{A}’$ to hash to $1$ (false positives) and the remaining $1 - \varepsilon$ to hash to $0$.

Read More

Entropy Maps

February 18, 2024

The PDF version of this post is available on GitHub.

An entropy map approximates a function $f : \mathcal{X} \to \mathcal{Y}$ by hashing domain values to prefix-free codes in the codomain. We store nothing about the domain itself. We just hash, and a prefix of that hash serves as a code for a codomain value.

We allow multiple codes per codomain value. For instance, the value a might be encoded by 00, 01, 10, and 11. If the hash is less than 4, we decode it as a.

Suppose $\Pr\lbrace f(X) = y\rbrace = p_y$ where $X \sim p_X$. The optimally space-efficient code, assuming a uniform hash function $h$, assigns prefix-free codes for $y$ whose probability of being selected by $h$ sums to $p_y$. The expected bit length is

$$ \ell = -\sum_{y \in \mathcal{Y}} p_y \log_2 p_y, $$

which is the entropy of the output distribution. That is why we call it an entropy map.

If $\mathcal{X}$ is finite, we can think of it as implicitly encoding the domain and storing the prefix-free code for each domain element. The average bit length per element is $\ell$, and the total is $|X| \ell$.

Rate distortion: Bernoulli maps

We can allow errors. If one codomain value $y’$ is very common (say $p_{y’} > .99$), we can give it a prefix-free code that covers probability $p_{y’}$ and then skip coding for it in the entropy map. A random $x \in \mathcal{X}$ will map to $y’$ with probability $p_{y’}$ (which can be made as close to 1 as desired by trading space for accuracy). For the remaining domain values, we code them correctly, or allow errors on those too after attempting correct coding.

Bernoulli set-indicator function

Consider a set-indicator function

$$ 1_{\mathcal{A}} : \mathcal{X} \to \lbrace0,1\rbrace, $$

where $\mathcal{A} \subseteq \mathcal{X}$ and $\mathcal{X}$ is very large (possibly infinite). We assign prefix-free codes for codomain value $1$ such that a random hash maps an element of $\mathcal{X}$ to a code for $1$ with probability $\varepsilon$, where $\varepsilon$ is small (say $2^{-10}$).

There exists a (countably infinite) set of hash functions that hash all elements in $\mathcal{A}$ to codes for $1$ and elements in $\mathcal{A}’ = \mathcal{X} \setminus \mathcal{A}$ to codes for either $0$ or $1$. Choosing a random hash function with this property, we expect $\varepsilon$ of the elements in $\mathcal{A}’$ to hash to $1$ (false positives) and the remaining $1 - \varepsilon$ to hash to $0$.

Read More

A Boolean Algebra Over Trapdoors

June 17, 2023

This project is available on GitHub.

Boolean Algebra

A Boolean algebra is a mathematical structure that captures the properties of logical operations and sets. Formally, it is a 6-tuple $(B, \land, \lor, \neg, 0, 1)$, where

  • $B$ is a set of elements,
  • $\land$ ($\rm{and}$) and $\lor$ ($\rm{or}$) are binary operations on $B$,
  • $\neg$ ($\rm{not}$) is a unary operation on $B$,
  • $0$ and $1$ are elements of $B$, the minimum and maximum elements.

These must satisfy the usual axioms: closure, commutativity, associativity, distributivity, identity, and complements [1].

Boolean algebras show up everywhere. They form the foundation of propositional logic and are fundamental to digital circuit design and computer architecture [2].

In set theory, the standard representation is the power set of a set $X$, denoted $\mathcal{P}(X)$:

  • $B = \mathcal{P}(X)$,
  • $\land = \cap$ (set intersection),
  • $\lor = \cup$ (set union),
  • $\neg = \complement$ (set complement),
  • $0 = \emptyset$ (empty set),
  • $1 = X$ (universal set).

This set-theoretic Boolean algebra, $(\mathcal{P}(X), \cap, \cup, \complement, \emptyset, X)$, is the canonical example and the starting point for what follows: a Boolean algebra over trapdoors [3]. The construction preserves the familiar Boolean algebra properties while introducing cryptographic elements for secure computations.

Homomorphisms in Boolean Algebra

A homomorphism is a structure-preserving map between two algebraic structures of the same type. For Boolean algebras, it is a function that preserves the operations and special elements.

Given two Boolean algebras $(A, \land_A, \lor_A, \neg_A, 0_A, 1_A)$ and $(B, \land_B, \lor_B, \neg_B, 0_B, 1_B)$, a function $f: A \to B$ is a Boolean algebra homomorphism if for all $x, y \in A$:

  1. $f(x \land_A y) = f(x) \land_B f(y)$
  2. $f(x \lor_A y) = f(x) \lor_B f(y)$
  3. $f(\neg_A x) = \neg_B f(x)$
  4. $f(0_A) = 0_B$
  5. $f(1_A) = 1_B$

A homomorphism preserves structure across the mapping: you can perform operations in one algebra and have them correspond to operations in the other [4].

This matters because it lets us build a mapping between our original Boolean algebra and a new structure with cryptographic elements while still maintaining the essential properties. Operations in the trapdoor algebra remain logically consistent with standard Boolean operations.

In the following sections, I introduce a specific homomorphism $F$ that maps elements from our original algebra to a Boolean algebra over bit strings, incorporating a cryptographic hash function. This homomorphism is the foundation of the Boolean algebra over trapdoors.

Read More