Building Languages to Solve Problems
When a problem is complex enough, the solution is often to build a language for that problem. SICP's most powerful idea.
Browse posts by category
When a problem is complex enough, the solution is often to build a language for that problem. SICP's most powerful idea.
Many structures come in pairs: forward/reverse AD, push/pull iteration, encode/decode. Recognizing duality lets you transfer theorems and insights between domains.
A reflection on eleven explorations in generic programming—how algorithms arise from algebraic structure.
A conceptual introduction to entropy maps—implementing functions with hash functions and prefix-free codes.
A mathematically elegant C++20 library for algebraic text processing and compositional parsing with fuzzy matching capabilities.
AlgoGraph brings functional programming elegance to graph algorithms with immutable data structures, pipe-based transformers, declarative selectors, and lazy views.
A C++17 header-only library implementing Computational Basis Transforms - a unified framework for understanding how FFT, logarithmic arithmetic, and Bayesian inference are all instances of the same pattern.
A framework for querying structured JSON documents using fuzzy logic principles, producing degree-of-membership scores instead of binary relevance.
A modern C++ header-only library implementing disjoint interval sets as first-class mathematical objects with rigorous Boolean algebra operations.
A powerful symbolic expression toolkit for rule-based term rewriting with pattern matching, multiple input formats, and an interactive REPL.
I asked an AI to brutally analyze my entire body of work—140+ repositories, 50+ papers, a decade and a half of research. The assignment: find the patterns I couldn’t see, the obsessions I didn’t know I had, the unifying thesis underlying …
I’ve been working on a series of papers that develop a unified theoretical framework for approximate and oblivious computing, centered around what I call Bernoulli types. These papers explore how we can build rigorous foundations for systems …
What if we could compute on encrypted data while preserving algebraic structure? Not through expensive homomorphic encryption, but through a principled mathematical framework that unifies oblivious computing, Bernoulli types, and categorical …
Encrypted search has a fundamental problem: you can’t hide what you’re looking for. Even with the best encryption, search patterns leak information. My recent work develops a new approach using oblivious Bernoulli types to achieve …
A virtual POSIX-compliant filesystem implementation using content-addressable DAG storage with SHA256 deduplication.
ZeroIPC transforms shared memory from passive storage into an active computational substrate, enabling functional and reactive programming paradigms across process boundaries with zero-copy performance.
Three approaches to computing derivatives---forward-mode AD, reverse-mode AD, and finite differences---each with different trade-offs. Understanding when to use each is essential for numerical computing and machine learning.
A production-ready streaming data processing system implementing boolean algebra over nested JSON structures. JAF brings dotsuite's pedagogical concepts to production with lazy evaluation, S-expression queries, and memory-efficient windowed …
A production-ready implementation of relational algebra for JSONL data with full support for nested structures. jsonl-algebra brings dotsuite's dotrelate concepts to production with streaming operations, schema inference, and composable pipelines.
A mathematically grounded ecosystem of composable tools for manipulating nested data structures. From simple helper functions to sophisticated data algebras, guided by purity, pedagogy, and the principle of least power.
A Lisp-like functional programming language designed for network transmission and distributed computing. JSL makes JSON serialization a first-class design principle, enabling truly mobile code with serializable closures and resumable computation.
A powerful, immutable-by-default tree manipulation library for Python with functional programming patterns, composable transformations, and advanced pattern matching.
A high-performance key-value storage system achieving sub-microsecond latency through memory-mapped I/O, approximate perfect hashing, and lock-free atomic operations. 10M ops/sec single-threaded, 98M ops/sec with 16 threads—12× faster than Redis, 87× …
A header-only C++20 library that achieves 3-10× compression with zero marshaling overhead. PFC makes compression an intrinsic type property through prefix-free codes (Elias Gamma/Delta, Fibonacci, Rice), algebraic types, and Stepanov's generic …
A modern C++20 library for compositional online data reductions with numerically stable algorithms and algebraic composition.
Sean Parent's type erasure technique provides value-semantic polymorphism without inheritance. Combined with Stepanov's algebraic thinking, we can type-erase entire algebraic structures.
What if a perfect hash function could simultaneously be: (1) cryptographically secure, (2) space-optimal, and (3) maximum-entropy encoded? This paper proves such a construction exists—and analyzes exactly what you sacrifice to get all three.
Numerical integration demonstrates both classical numerical analysis and Stepanov's philosophy: by identifying the minimal algebraic requirements, our quadrature routines work with dual numbers for automatic differentiation under the integral.
Reverse-mode automatic differentiation powers modern machine learning. Understanding how it works demystifies PyTorch, JAX, and TensorFlow---it's just the chain rule applied systematically.
Most hash libraries treat hash functions as black boxes. Algebraic Hashing exposes their mathematical structure, letting you compose hash functions like algebraic expressions—with zero runtime overhead.
Hash functions form an abelian …
The art of numerical differentiation lies in choosing step size h wisely---small enough that the approximation is good, but not so small that floating-point errors dominate.
In the paper, “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” the authors, Suleman, Mutlu, Qureshi, and Patt, essentially concern themselves with the problem popularly revealed in …
Multiprocessor synchronization is a notoriously tricky subject matter. Unlike with a single thread of execution, in a shared-resource system, where resources are shared among multiple independent processors, we must think very hard about how the …
Dual numbers extend our number system with an infinitesimal epsilon where epsilon^2 = 0. Evaluating f(x + epsilon) yields f(x) + epsilon * f'(x)---the derivative emerges automatically from the algebra.
elementa is a pedagogical linear algebra library where every design decision prioritizes clarity over cleverness---code that reads like a textbook that happens to compile.
The same GCD algorithm works for integers and polynomials because both are Euclidean domains. This profound insight shows how algebraic structure determines algorithmic applicability.
Rational numbers give us exact arithmetic where floating-point fails. The implementation reveals deep connections to GCD, the Stern-Brocot tree, and the algebraic structure of fields.
The Miller-Rabin primality test demonstrates how probabilistic algorithms can achieve arbitrary certainty, trading absolute truth for practical efficiency.
Integers modulo N form a ring---an algebraic structure that determines which algorithms apply. Understanding this structure unlocks algorithms from cryptography to competitive programming.
The Russian peasant algorithm teaches us that one algorithm can compute products, powers, Fibonacci numbers, and more---once we see the underlying algebraic structure.