Perfect Hashing: Space Bounds, Entropy, and Cryptographic Security

February 1, 2024

Can a perfect hash function be cryptographically secure, space-optimal, and maximum-entropy encoded all at once? This paper proves such a construction exists and analyzes exactly what you sacrifice to get all three.

The Impossible Triangle

Perfect hash functions typically face tradeoffs. Space-optimal constructions (CHD, BDZ) sacrifice randomness. Cryptographic hash functions waste space on collision resistance. Maximum-entropy encodings require extra bits.

The question: can you have all three?

The Construction

The data type is PH = {0,1} x N* with a constructor ph(X, r) = (n', N) where:

  • N = ceil(m/r): Hash table size (where m = |X|)
  • beta(x,n) = trunc(hash(x’ # n’), k)’ mod N: Hash function parameterized by seed n
  • n = min{j in N | beta is collision-free on X}: Search for smallest collision-free seed
  • n’: Geometric code encoding n (variable-length prefix-free)

The algorithm: try seeds n = 0, 1, 2, … until beta(.,n) has no collisions on X. Each trial is geometrically distributed with success probability p(m,r), so the expected space for encoding n’ achieves the information-theoretic lower bound of roughly 1.44 bits per element.

Under the random oracle assumption, hash: {0,1}* -> {0,1}^infinity outputs uniform random bits, making the final encoding (n’, N) indistinguishable from a random bit string.

Rate-Distortion Tradeoff

The “rate-distortion” framing comes from information theory: what’s the rate (bits per key) for a given distortion (lookup time)?

Zero distortion (O(1) lookup): roughly n log n bits. Constant distortion (small tables): practical two-level schemes approach roughly 1.44n bits. Variable distortion: trade bits for lookup time continuously.

Why Cryptographic Matters

Non-cryptographic perfect hashes are deterministic, so adversaries can engineer collision-inducing inputs. A cryptographic perfect hash (random oracle) prevents adversarial key selection (can’t craft keys that break the hash), side-channel attacks (encoding reveals no information about keys), and fingerprinting (maximum entropy makes encodings look random).

The Algebra of Composition

Section 5 proves that composing perfect hash functions preserves injectivity. If h1: S -> T and h2: T -> U are injective, then h2 composed with h1: S -> U is injective.

This connects to my algebraic_hashing library, where composition of cryptographic hashes via XOR preserves both security and structure.

Read More