Algebraic_hashing

Below you will find pages that utilize the taxonomy term “Algebraic_hashing”

The Beautiful Deception: How 256 Bits Pretend to be Infinity

July 1, 2024

How do you store infinity in 256 bits? You don’t. But you can fake it well enough that no bounded observer can tell the difference. This paper is about that deception, why it works, and what it tells us about randomness.

The impossible oracle

A random oracle maps any input to an infinite sequence of perfectly random bits. Try to implement one and you fail immediately:

Memory unboundedness: each new query exhausts memory
Non-serializability: can’t save/restore state
Non-reproducibility: each instance generates different values
Non-distributability: can’t share across systems

This isn’t a limitation of current hardware. It’s a constructive proof that true random oracles can’t exist in our computational universe.

The lie that works

From this impossibility comes something useful:

class LazyDigest:
    def __init__(self, seed):
        self.seed = seed

    def __getitem__(self, index):
        return hash(seed || index)[0]

256 bits of entropy generating what appears to be an infinite random sequence.

The deception:

Appears: infinite random sequence
Actually: deterministic function with 256 bits of state
Information content: K(LazyDigest) = 256 bits + constant
Apparent information: infinite

We’re achieving a compression ratio of infinity, representing unbounded data with bounded information.

Why it works

Computational indistinguishability. If h is a secure PRF, no polynomial-time algorithm can tell LazyDigest apart from truly random output. We’re not random. We’re computationally hard to distinguish from random. This weaker guarantee is sufficient for all of cryptography.

Since LazyDigest has finite state, it must eventually cycle. After at most 2^256 queries, it repeats. But the expected cycle length is 2^128, roughly 10^38. At a billion queries per second, cycling takes about 10^21 years. The universe is roughly 10^10 years old.

Advanced constructions

Hierarchical seeding extends the effective period:

epoch_seed = h(master_seed || "epoch" || epoch)
chunk_seed = h(epoch_seed || "chunk" || chunk)
value = h(chunk_seed || position)[0]

XOR multi-hash hedges against individual algorithm failures:

result = sha256(seed||index)[0] ^ sha512(seed||index)[0] ^
         sha3_256(seed||index)[0] ^ blake2b(seed||index)[0]

The system stays secure if at least one hash function holds. This hedges against future cryptanalysis, quantum vulnerabilities, and implementation bugs.

Sponge construction reserves capacity bits that never leave the system, providing tunable security with 2^(capacity/2) collision resistance.

Random oracles and uncomputable reals

Most real numbers are uncomputable. They require infinite information to specify. A true random oracle is the cryptographic analog:

Computable reals (like pi, e): measure zero, finite programs
Uncomputable reals: measure one, infinite information
LazyDigest: computable, appears random
Random oracle: uncomputable, truly random

We’re using computable functions to approximate uncomputable ones.

Algebraic Hashing: Composable Hash Functions Through XOR

November 1, 2022

Most hash libraries treat hash functions as opaque blobs. You put data in, you get bits out, and that’s the end of the story. Algebraic Hashing takes a different approach: it exposes the mathematical structure underneath, so you can compose hash functions like algebraic expressions. And because this is C++20 with concepts and templates, the composition resolves entirely at compile time. Zero runtime overhead.

The observation

Hash functions form an abelian group under XOR:

Closure: h1 XOR h2 is still a valid hash function
Associativity: (h1 XOR h2) XOR h3 = h1 XOR (h2 XOR h3)
Identity: XOR with zero
Inverses: each hash is its own inverse under XOR

This is a clean algebraic structure, and it’s the foundation for everything that follows.

What you can do with it

Compile-time composition. Using C++20 concepts and template metaprogramming:

auto composed = fnv1a<> ^ sha256<>;
auto hash = composed("data");  // Zero runtime overhead

All composition resolves at compile time. No virtual dispatch, no function pointers.

Provable properties. XOR composition preserves uniform distribution (under independence), avalanche effect, and collision resistance for cryptographic hashes. These aren’t just empirical observations. They follow from the group structure.

Universal interface. Works with any hash function: non-cryptographic (FNV-1a), perfect (FKS), or cryptographic (SHA-256). The algebra doesn’t care about the implementation details.

Practical uses

Domain separation: hash_user ^ hash_timestamp prevents collision attacks across domains
Perfect hashing: FKS two-level scheme with pluggable base hash functions
Composite keys: hash multiple fields independently, then XOR
Type-based hashing: different hash functions for different types, composed generically

Connection to oblivious computing

This work shares DNA with my oblivious computing research. The common thread is making mathematical structure explicit in the type system. Just as Bernoulli types enforce privacy invariants algebraically, algebraic hashing enforces compositional invariants through group theory. Same philosophy, different domain.

The library is header-only C++20 with zero-cost abstractions via concepts.

View the whitepaper | GitHub