Over the past decade, I’ve published more than 120 open-source repositories on GitHub. They span encrypted search theory, statistical reliability engineering, generic programming, fuzzy logic, AI tools, Unix-philosophy CLI utilities, three novels, a community math database, and a digital legacy project motivated by the possibility that I might not be around to maintain any of it.
From the outside, it probably looks like a sprawl. From the inside, almost everything connects.
This post is the map I wish I’d written years ago. It’s for anyone who stumbles into my GitHub profile and wonders: what is all this, and how does it fit together?
Two threads run through everything:
Algebraic composability. I believe the right abstractions compose. Hash functions compose into perfect hash filters. Likelihood contributions compose into maximum likelihood estimators. Boolean predicates compose into fuzzy inference rules. CLI tools compose into personal knowledge pipelines. When I find a domain where composition is possible but nobody has made it ergonomic, I start a project.
Personal data sovereignty. Your conversations, bookmarks, photos, email, health records, and reading notes belong to you—not to a cloud platform that might shut down, change terms, or get acquired. A surprising number of my tools exist to keep personal data in local SQLite databases, queryable from the command line, exportable as plain text, and durable across decades.
If you keep those two threads in mind, the ecosystem stops looking like a sprawl and starts looking like a lattice.
The Research Foundation
Oblivious Computing and Encrypted Search
This is where it started. My master’s thesis explored how to search encrypted data without revealing what you’re searching for—the problem known as oblivious computing or encrypted search.
The core question: can you build a search index where the server processes queries without learning the query content, the document content, or even which documents matched? The answer is yes, but only with careful algebraic constraints on the encryption.
The key projects:
- encrypted_search_thesis — The thesis itself, formalizing encrypted search using trapdoor Boolean algebras
- algebraic_cipher_types — A type-theoretic framework for reasoning about ciphertexts as algebraic objects. This paper (blog post) defines ciphertext types that preserve specific algebraic operations under encryption.
- bernoulli_data_type — The Bernoulli model for approximate encrypted search—when exact matches aren’t possible, what’s the information-theoretic cost?
- boolean-algebra-over-trapdoor-sets — Building Boolean algebra operations (AND, OR, NOT) on encrypted sets
- encrypted_search_confidentiality — Measuring what encrypted search leaks via entropy and mutual information
These are theoretical projects—papers and proofs, not production systems. They establish a mathematical framework for thinking about computation over encrypted data. The practical insight: if you model a ciphertext as a type with algebraic operations, you can reason about what computations preserve confidentiality and which ones leak.
Statistical Reliability
The other half of my thesis work, and the part that became production software. The question: if a series system fails but you can’t tell which component caused the failure, how do you estimate component lifetimes?
This is masked data analysis. A system has m components in series. When it fails at time t, you get a candidate set of possible failure causes, not a definitive answer. Sometimes the system is still running when you stop testing (right-censoring). Given all this uncertainty, can you do maximum likelihood estimation?
Yes. And the resulting R packages now form a layered architecture—each layer building on the one below:
nabla — Exact derivatives via automatic differentiation
↓
femtograd — AD for statistical computing
↓
algebraic.mle — MLE as algebraic objects (on CRAN)
↓
compositional.mle — DSL for composing solvers (chaining, racing, restarts)
↓
likelihood.model — Composable likelihood framework (submitted to CRAN)
↓
likelihood.model.series.md — Masked series systems (next CRAN submission)
↓
wei.series.md.c1.c2.c3 — Weibull series estimation (the applied capstone)
I’ll detail the full R ecosystem below, but the key point is this: the theoretical work on masked data produced a general-purpose framework for composable statistical inference. The specific reliability application is just one instance of the pattern.
Information Theory and Sequential Prediction
A smaller but persistent thread: sequential prediction and information-theoretic approaches to learning.
- infinigram — Exploring what happens when n-gram models have unbounded context. Related to Solomonoff induction and CTW algorithms.
- fisher-flow — Connecting Fisher information geometry to information flow in statistical models
- langcalc — Token-level entropy calculations for language models
This cluster connects the encrypted search work (which is fundamentally about information leakage) to the statistical reliability work (which is about inference under missing information) through their shared foundation in information theory.
The R Package Ecosystem
Thirteen R packages, two on CRAN, one submitted, the rest on r-universe. They form a composable architecture where each layer adds capability without breaking the abstractions below.
The Core Stack
| Package | Role | Status |
|---|---|---|
| algebraic.mle | MLE objects with algebraic operations (compose, transform, subset) | CRAN |
| algebraic.dist | Probability distributions as algebraic objects | CRAN |
| likelihood.model | Composable likelihood contributions for statistical models | Submitted to CRAN |
| likelihood.model.series.md | Series system MLE with masked failure causes | Next CRAN submission |
The key design insight: likelihood contributions compose. When observations are independent, the total log-likelihood is the sum of individual contributions. likelihood.model provides the composition machinery. algebraic.mle provides the algebra for manipulating fitted models. Each domain-specific package (series systems, masked data, etc.) just provides the contributions.
This means you can mix and match. Define your own likelihood contributions and they plug into the same optimization, bootstrap, and model selection infrastructure.
Supporting Packages
| Package | Role |
|---|---|
| nabla | Automatic differentiation for R — gradient computation without finite differences |
| femtograd | Minimal forward-mode AD, a teaching implementation |
| hypothesize | Hypothesis testing framework built on the likelihood model |
| dfr_dist | Distribution-free reliability estimation |
| md.tools | Utilities for encoding and decoding masked data |
| wei.series.md.c1.c2.c3 | Weibull series system implementations |
| mdrelax | What happens when the standard masking assumptions (C1/C2/C3) are violated |
| compositional.mle | Compositional extensions to the MLE framework |
The Design Philosophy
Every package follows the same pattern: define an S3 object with generic functions (loglik(), score(), hess_loglik(), fit(), rdata()), make it composable, and let the algebra do the work. If you’ve used one package in the stack, you know the interface for all of them.
This is Stepanov’s philosophy applied to statistics: find the minimal algebraic structure, then build generic algorithms that work on anything satisfying that structure.
Data Structures and Algorithms
Python Libraries (PyPI)
Over 20 packages are published to PyPI under the queelius account, spanning data structures, symbolic computation, CLI tools, and more. The highlights:
- AlgoTree (16 stars) — Generic tree data structures and algorithms: traversal, manipulation, serialization, conversion between representations. (Blog post)
- AlgoGraph — Graph algorithms in the same style
- symlik — Symbolic likelihood computation (blog post)
- rerum — Term rewriting engine
- dotsuite — Boolean algebra operations
- jaf — JSON algebra (merge, diff, patch)
- jsonl-algebra — Algebraic operations on JSONL streams
- fuzzy-infer — Fuzzy logic inference engine
- nfa-tools — NFA manipulation and visualization
- src2md — Source code to markdown conversion
- pfc (Python bindings) — Python interface for the pfc prefix-free coding C++ library
Most of the *tk CLI tools (btk, ctk, ebk, etc.) are also installable as Python packages. The pattern: algebraic objects with well-defined operations—the same philosophy that drives the R and C++ packages, applied to Python.
C++ Libraries
The C++ projects are where the generic programming philosophy is most explicit. They’re inspired by Alexander Stepanov’s work—the idea that algorithms should be parameterized by the algebraic structures they operate on, not by specific data types. Most are header-only C++20 libraries.
Data structures and hashing:
- algebraic_hashing — Hash function composition as algebraic morphisms, with a DSL for combining hash functions
- sparse_spatial_hash — N-dimensional sparse spatial hashing for collision detection and neighbor queries
- maph — Space-efficient approximate mappings using perfect hash functions, with configurable storage and accuracy trade-offs
- pfc — Zero-copy, prefix-free data representations with algebraic types and succinct data structures
- cbt — Computational basis transforms between domains
- bloomy — Secure index based on Bloom filters (connecting to the encrypted search work)
- packed_data — Compact data representations
- accumux — Accumulator and multiplexer patterns
Numerical and algebraic computation:
- limes — Composable calculus expressions: symbolic differentiation, numerical integration, algebraic composition
- elementa — Pedagogical C++20 linear algebra library
- gradator — Pedagogical C++20 automatic differentiation library
- dual — Dual number arithmetic for automatic differentiation
Simulation:
- barnes-hut — Barnes-Hut tree algorithm for N-body simulation
The Stepanov essay series (Seeing Structure First) explains the philosophy: an algorithm like power(x, n, op) computes x^n under any monoid operation—exponentiation, matrix power, string repetition, path composition. The C++ libraries embody this by treating data structures as instances of algebraic concepts, not as standalone implementations.
Boolean, Fuzzy, and Symbolic Computation
A cluster of Python projects (all on PyPI) exploring logic beyond true/false:
- dotsuite — Boolean algebra operations and visualization
- jaf — JSON algebra: treating JSON documents as algebraic objects with merge, diff, and patch operations
- fuzzy-infer — Fuzzy logic inference engine
- soft-circuit — Soft Boolean circuits where gates have continuous rather than discrete outputs
- fuzzy-soft-circuit — Combining fuzzy logic with soft circuits
- fuzzy-logic-search — Search systems that use fuzzy logic for relevance scoring
- rerum — Term rewriting engine
- xtk — Expression toolkit for symbolic manipulation
- tree_rewriter — Tree-based term rewriting
This cluster connects to the encrypted search work (Boolean algebra over trapdoor sets) and to the AI work (soft circuits as differentiable logic). The thread: logic is algebraic structure, and generalizing from crisp to fuzzy to soft to encrypted is a matter of changing which algebraic laws you preserve.
The Unix Toolkit Constellation
Fifteen CLI tools that share a philosophy: SQLite-backed, CLI-first, composable, personal data sovereignty. They use a naming convention—most end in tk (toolkit) or k—and are designed to work together through Unix pipes and shared conventions.
| Tool | Domain | Description |
|---|---|---|
| btk | Bookmarks | SQLite-backed bookmarks with hierarchical tags and a typed query DSL |
| ebk | Ebooks | E-book library management, metadata extraction, format conversion |
| mtk | Local email archival and search | |
| ptk | Photos | Photo management with EXIF metadata and tagging |
| atk | Audio | Audio file management and metadata |
| ctk | Conversations | LLM conversation archival and search (blog post) |
| xtk | Expressions | Symbolic expression manipulation |
| jot | Notes | Journaling and note-taking |
| deets | Personal metadata | Identity and metadata management |
| repoindex | Repositories | Git repository database and query tool |
| crier | Social media | Cross-posting blog content to social platforms (12 stars) |
| chop | Images | Image manipulation |
| dapple | Terminal graphics | Rich terminal output and visualization |
| clerk | Workflows | Task and workflow management |
| chartfold | Health records | Personal health record management |
The Philosophy
These tools share a conviction: your personal data should live in local SQLite databases, not in someone else’s cloud. Every tool stores its data in a format you can query with SQL, back up with cp, and inspect with sqlite3. They’re designed to compose through pipes:
# Find bookmarks tagged "research", get their URLs, search your ebooks for related content
btk query --tag research --format url | xargs -I{} ebk search --related {}
The blog series on digital legacy explains the motivation. When I was diagnosed with cancer, the question “what happens to my digital life?” became urgent. These tools are one answer: if your data lives in portable, open formats on hardware you control, it survives you.
AI and LLM Projects
The largest and fastest-growing cluster, spanning reasoning, search, and safety.
LLM Tools and Research
- elasticsearch-lm (37 stars) — Language model integration with Elasticsearch. This bridges the encrypted search research with modern LLM applications.
- mcts-reasoning — Monte Carlo Tree Search for structured reasoning. Uses tree search (connecting to AlgoTree) to improve LLM reasoning quality. (Blog post)
- dreamlog — Logic programming with LLM integration and wake-sleep learning cycles
- complex-network-rag — RAG (Retrieval-Augmented Generation) over complex document networks
- ollama_data_tools (4 stars) — Utilities for working with Ollama models and data
- llm-bayes — Bayesian reasoning with LLMs
- synthdata — Synthetic data generation
- agentum — Unified framework for sequential decision-making, from classical search to deep RL
- itinero — LLM-powered web automation through composable strategies and Playwright
- AutoPoiesi — Self-organizing systems inspired by autopoiesis theory
The Thread
These aren’t random AI projects. The encrypted search work asks: “How do you search without revealing what you know?” The LLM work asks the complement: “How do you reason with what you know?” Both are about the interface between information and computation. elasticsearch-lm literally bridges the two—it’s a language model that searches.
Digital Legacy Projects
Five projects exploring what happens to digital identity after death. This is the Long Echo series in code form.
- longecho — The framework for durable personal data. Redundant storage, format migration, integrity verification.
- longshade — Generate a conversable persona from personal data: conversations, writings, and email
- posthumous — Automated actions triggered by death or incapacity. Dead man’s switches, scheduled messages, data release.
- pagevault — Password-protect semi-private content on static sites like Hugo blogs
- cryptoid — Client-side encrypted content for Hugo static sites with multi-user access control
These are nascent. They share a philosophy but don’t yet share infrastructure. The vision: a unified system where your personal data (managed by the *tk tools) is stored durably (longecho), distilled into a conversable persona (longshade), protected on your sites (pagevault, cryptoid), and handled according to your wishes after death (posthumous).
The essays on digital legacy provide the philosophical backdrop. The tools provide the engineering.
Literature
Three novels, each exploring different aspects of consciousness and intelligence:
- echoes-of-the-sublime — Philosophical horror, approximately 103,000 words. What happens when consciousness encounters something genuinely beyond its capacity to process? (Blog post)
- the-policy — AI alignment science fiction. A near-future story about the consequences of a specific alignment strategy. (Blog series)
- call-of-asheron — Epic fantasy. Emergence, self-organization, and what happens when a world’s underlying rules become visible to its inhabitants.
The novels aren’t decoration. They’re explorations of the same themes that drive the code—consciousness, information, what persists and what doesn’t—in a medium that allows for ambiguity and nuance that research papers can’t.
There’s also stories, a collection of shorter fiction.
The Erdos Problems Database
One project worth mentioning separately: erdosproblems (470 stars) is a community database for the problems listed on erdosproblems.com—Thomas Bloom’s curation of Paul Erdos’s open problems in combinatorics, number theory, and graph theory. My contribution is to the structured data layer: making the problems machine-readable, searchable, and cross-referenced. It reflects the same instinct that drives much of this ecosystem: make knowledge structured and open.
Programming Languages and Interpreters
A small cluster of projects exploring language design:
- jsl — JSON-based scripting language
- jsonl-algebra — Algebraic operations on JSONL streams
- nfa-tools — Non-deterministic finite automaton manipulation
- dagshell — Shell with DAG-based execution
- tex2any / texflow — LaTeX transformation tools
These connect to the symbolic computation cluster (rerum, xtk, tree_rewriter) through their shared concern with formal language manipulation.
Education and Essay Series
Four ongoing essay series, each associated with a repository of supporting material:
- stepanov — Generic programming in the style of Stepanov. Eleven essays on finding algebraic structure in algorithms.
- sicp — Abstraction and composition, inspired by Structure and Interpretation of Computer Programs.
- the-learning-problem — Machine learning from an information-theoretic perspective. Sequential prediction, Solomonoff induction, compression as learning.
- the-long-echo — Digital legacy, personal data sovereignty, and what persists.
Space Simulation
Two projects that stand somewhat apart:
- space-sandbox-sim — N-body physics sandbox
- star-system-sim — Star system generation and simulation
These connect to the C++ algorithmic work (barnes-hut tree algorithm for N-body simulation) and to the generic programming philosophy (physics simulation as algebraic structure).
Infrastructure and Meta-tools
The projects that support everything else:
- metafunctor — This Hugo blog. The central publication venue for all the work above.
- queelius.github.io — GitHub Pages deployment
- queelius.r-universe.dev — R-universe configuration for the R package ecosystem
- src2md — Convert source code to markdown for documentation
- texwatch — Watch LaTeX files and rebuild on change
- sandrun — Sandboxed command execution
- zeroipc — Zero-copy IPC mechanisms (multi-language)
Gap Analysis: What’s Missing
Looking at 120+ projects as a whole, several gaps become visible:
No Unified Documentation Portal
Sixty-two repos have GitHub Pages sites, but there’s no cross-project index. If you want to understand how algebraic.mle connects to likelihood.model connects to likelihood.model.series.md, you have to read three separate documentation sites and mentally stitch them together. A single documentation portal with cross-linked API docs and a dependency graph would make the ecosystem navigable.
Missing CI Infrastructure
Many beta-stage repositories lack continuous integration. The R packages have CI through r-universe, but the Python and C++ projects are inconsistent. A standardized CI template across the ecosystem would catch breakage earlier.
No Rich Python Package Index
Over 20 packages are on PyPI under the queelius account, but the PyPI profile is just a flat list of names. There’s no equivalent to r-universe—no descriptions, no dependency graph, no grouping by theme. A curated index page (or a Python equivalent of r-universe) would make the Python ecosystem as navigable as the R one.
No Cross-Project Dependency Graph
It’s hard to see which projects build on which. The R packages have an implicit dependency tree, but it’s not visualized. The *tk tools share conventions but not code. A visual dependency map would help contributors and users understand the architecture.
C++ Libraries Lack a Unifying Build Mechanism
The C++ projects don’t share a package manager configuration (Conan, vcpkg, or similar). Each library is standalone. For users who want to use multiple libraries together—say, algebraic_hashing with sparse_spatial_hash—there’s no easy integration path.
Scattered Fuzzy Logic
Four separate fuzzy logic projects (fuzzy-infer, fuzzy-logic-search, fuzzy-soft-circuit, soft-circuit) share concepts but not code. They could be unified into a single, layered fuzzy logic library with clear separation between the inference engine, the circuit model, and the search application.
The *tk Tools Need a Meta-Installer
The Unix toolkit tools share a philosophy and naming convention, but there’s no umbrella project, shared library, or meta-installer. Common patterns (SQLite storage, CLI argument parsing, query DSL) are reimplemented in each tool. A shared foundation library would reduce duplication and make it easier to build new *tk tools.
No Benchmarking Suite for Probabilistic Data Structures
The hashing and data structure projects (maph, algebraic_hashing, sparse_spatial_hash, pfc) lack comparative benchmarks. A shared benchmark suite would make performance claims credible and help users choose between implementations.
Digital Legacy Projects Need Integration
The longecho/longshade/posthumous projects articulate a compelling vision but exist as separate, early-stage prototypes. The integration story—how they work together, and how they connect to the *tk tools—hasn’t been built yet.
Missing End-to-End Tutorials
The R packages have individual documentation, but there’s no tutorial showing how to use them together: generate synthetic masked data, fit models, compare via AIC, bootstrap confidence intervals, and visualize results—all using the composable architecture.
Where to Start
If you’ve read this far and want to explore, here are entry points by interest:
If you’re a statistician or reliability engineer: Start with algebraic.mle (on CRAN), then read the likelihood.model.series.md post. The composable likelihood architecture is the most mature part of the ecosystem.
If you’re a C++ programmer interested in generic programming: Start with the Stepanov series, especially Seeing Structure First. Then look at the C++ libraries (algebraic_hashing, maph, pfc) for worked examples of the philosophy.
If you work with trees or graphs in Python: AlgoTree and AlgoGraph are on PyPI and treat data structures as composable algebraic objects.
If you care about personal data sovereignty: Start with btk (bookmarks) or ctk (conversations)—they’re the most polished *tk tools. Then read the Long Echo essays for the philosophical motivation.
If you’re interested in AI/LLM tools: elasticsearch-lm bridges traditional search with language models. mcts-reasoning applies tree search to LLM reasoning.
If you’re a mathematician: The oblivious computing papers apply algebra to cryptography. The information theory posts connect Solomonoff induction to practical prediction.
If you want to read fiction: echoes-of-the-sublime is the longest and most developed. the-policy is the most timely—AI alignment as lived experience.
The Shape of the Whole
If I step back and look at everything together, I see a single question asked in many registers:
How do you build reliable knowledge from noisy, incomplete, and partial observations?
In encrypted search, the noise is intentional—it is the price of privacy. In reliability engineering, the noise is censoring and masking—the data you wish you had. In fuzzy logic, the noise is inherent in vague predicates. In LLM reasoning, the noise is the stochasticity of language models. In digital legacy, the noise is time itself—bit rot, platform decay, forgetting.
The answer, in every case, is the same: algebraic structure. Find the right abstractions, make them compose, and the noise becomes manageable.
What does 120+ projects look like, all at once? It looks like a person thinking out loud for over a decade. The early projects are tentative—data structures, small algorithms, conference papers. The middle period is the thesis work: rigorous, focused, building toward specific results. The recent projects are more ambitious and more personal—novels, digital legacy tools, an essay about watching intelligence leave the body.
The connecting thread isn’t a technology or a domain. It’s a question: what structures persist? Algebraic structures persist across implementations. Mathematical results persist across paradigms. Well-designed software interfaces persist across refactoring. Stories persist across readers. And if you’re careful about formats and infrastructure, personal data can persist across a lifetime—and beyond.
That’s the ecosystem. It’s incomplete, unevenly documented, and will probably never be finished. But the map is here now, and the territory is open.
Resources:
Discussion