Navigating the Metafunctor Ecosystem: A Map of 120+ Open Source Projects

February 13, 2026 17 min read

Over the past decade, I’ve published more than 120 open-source repositories on GitHub. They span encrypted search theory, statistical reliability engineering, generic programming, fuzzy logic, AI tools, Unix-philosophy CLI utilities, three novels, a community math database, and a digital legacy project motivated by the possibility that I might not be around to maintain any of it.

From the outside, it probably looks like a sprawl. From the inside, almost everything connects.

This post is the map I wish I’d written years ago. It’s for anyone who stumbles into my GitHub profile and wonders: what is all this, and how does it fit together?

Two threads run through everything:

Algebraic composability. I believe the right abstractions compose. Hash functions compose into perfect hash filters. Likelihood contributions compose into maximum likelihood estimators. Boolean predicates compose into fuzzy inference rules. CLI tools compose into personal knowledge pipelines. When I find a domain where composition is possible but nobody has made it ergonomic, I start a project.
Personal data sovereignty. Your conversations, bookmarks, photos, email, health records, and reading notes belong to you—not to a cloud platform that might shut down, change terms, or get acquired. A surprising number of my tools exist to keep personal data in local SQLite databases, queryable from the command line, exportable as plain text, and durable across decades.

If you keep those two threads in mind, the ecosystem stops looking like a sprawl and starts looking like a lattice.

The Research Foundation

Oblivious Computing and Encrypted Search

This is where it started. My master’s thesis explored how to search encrypted data without revealing what you’re searching for—the problem known as oblivious computing or encrypted search.

The core question: can you build a search index where the server processes queries without learning the query content, the document content, or even which documents matched? The answer is yes, but only with careful algebraic constraints on the encryption.

The key projects:

encrypted_search_thesis — The thesis itself, formalizing encrypted search using trapdoor Boolean algebras
algebraic_cipher_types — A type-theoretic framework for reasoning about ciphertexts as algebraic objects. This paper (blog post) defines ciphertext types that preserve specific algebraic operations under encryption.
bernoulli_data_type — The Bernoulli model for approximate encrypted search—when exact matches aren’t possible, what’s the information-theoretic cost?
boolean-algebra-over-trapdoor-sets — Building Boolean algebra operations (AND, OR, NOT) on encrypted sets
encrypted_search_confidentiality — Measuring what encrypted search leaks via entropy and mutual information

These are theoretical projects—papers and proofs, not production systems. They establish a mathematical framework for thinking about computation over encrypted data. The practical insight: if you model a ciphertext as a type with algebraic operations, you can reason about what computations preserve confidentiality and which ones leak.

Statistical Reliability

The other half of my thesis work, and the part that became production software. The question: if a series system fails but you can’t tell which component caused the failure, how do you estimate component lifetimes?

This is masked data analysis. A system has m components in series. When it fails at time t, you get a candidate set of possible failure causes, not a definitive answer. Sometimes the system is still running when you stop testing (right-censoring). Given all this uncertainty, can you do maximum likelihood estimation?

Yes. And the resulting R packages now form a layered architecture—each layer building on the one below:

nabla                        — Exact derivatives via automatic differentiation
  ↓
femtograd                    — AD for statistical computing
  ↓
algebraic.mle                — MLE as algebraic objects (on CRAN)
  ↓
compositional.mle            — DSL for composing solvers (chaining, racing, restarts)
  ↓
likelihood.model             — Composable likelihood framework (submitted to CRAN)
  ↓
likelihood.model.series.md   — Masked series systems (next CRAN submission)
  ↓
wei.series.md.c1.c2.c3       — Weibull series estimation (the applied capstone)

I’ll detail the full R ecosystem below, but the key point is this: the theoretical work on masked data produced a general-purpose framework for composable statistical inference. The specific reliability application is just one instance of the pattern.

Information Theory and Sequential Prediction

A smaller but persistent thread: sequential prediction and information-theoretic approaches to learning.

infinigram — Exploring what happens when n-gram models have unbounded context. Related to Solomonoff induction and CTW algorithms.
fisher-flow — Connecting Fisher information geometry to information flow in statistical models
langcalc — Token-level entropy calculations for language models

This cluster connects the encrypted search work (which is fundamentally about information leakage) to the statistical reliability work (which is about inference under missing information) through their shared foundation in information theory.

The R Package Ecosystem

Thirteen R packages, two on CRAN, one submitted, the rest on r-universe. They form a composable architecture where each layer adds capability without breaking the abstractions below.

The Core Stack

Package	Role	Status
algebraic.mle	MLE objects with algebraic operations (compose, transform, subset)	CRAN
algebraic.dist	Probability distributions as algebraic objects	CRAN
likelihood.model	Composable likelihood contributions for statistical models	Submitted to CRAN
likelihood.model.series.md	Series system MLE with masked failure causes	Next CRAN submission

The key design insight: likelihood contributions compose. When observations are independent, the total log-likelihood is the sum of individual contributions. likelihood.model provides the composition machinery. algebraic.mle provides the algebra for manipulating fitted models. Each domain-specific package (series systems, masked data, etc.) just provides the contributions.

This means you can mix and match. Define your own likelihood contributions and they plug into the same optimization, bootstrap, and model selection infrastructure.

Supporting Packages

Package	Role
nabla	Automatic differentiation for R — gradient computation without finite differences
femtograd	Minimal forward-mode AD, a teaching implementation
hypothesize	Hypothesis testing framework built on the likelihood model
dfr_dist	Distribution-free reliability estimation
md.tools	Utilities for encoding and decoding masked data
wei.series.md.c1.c2.c3	Weibull series system implementations
mdrelax	What happens when the standard masking assumptions (C1/C2/C3) are violated
compositional.mle	Compositional extensions to the MLE framework

The Design Philosophy

Every package follows the same pattern: define an S3 object with generic functions (loglik(), score(), hess_loglik(), fit(), rdata()), make it composable, and let the algebra do the work. If you’ve used one package in the stack, you know the interface for all of them.

This is Stepanov’s philosophy applied to statistics: find the minimal algebraic structure, then build generic algorithms that work on anything satisfying that structure.

Data Structures and Algorithms

Python Libraries (PyPI)

Over 20 packages are published to PyPI under the queelius account, spanning data structures, symbolic computation, CLI tools, and more. The highlights:

AlgoTree (16 stars) — Generic tree data structures and algorithms: traversal, manipulation, serialization, conversion between representations. (Blog post)
AlgoGraph — Graph algorithms in the same style
symlik — Symbolic likelihood computation (blog post)
rerum — Term rewriting engine
dotsuite — Boolean algebra operations
jaf — JSON algebra (merge, diff, patch)
jsonl-algebra — Algebraic operations on JSONL streams
fuzzy-infer — Fuzzy logic inference engine
nfa-tools — NFA manipulation and visualization
src2md — Source code to markdown conversion
pfc (Python bindings) — Python interface for the pfc prefix-free coding C++ library

Most of the *tk CLI tools (btk, ctk, ebk, etc.) are also installable as Python packages. The pattern: algebraic objects with well-defined operations—the same philosophy that drives the R and C++ packages, applied to Python.

C++ Libraries

The C++ projects are where the generic programming philosophy is most explicit. They’re inspired by Alexander Stepanov’s work—the idea that algorithms should be parameterized by the algebraic structures they operate on, not by specific data types. Most are header-only C++20 libraries.

Data structures and hashing:

algebraic_hashing — Hash function composition as algebraic morphisms, with a DSL for combining hash functions
sparse_spatial_hash — N-dimensional sparse spatial hashing for collision detection and neighbor queries
maph — Space-efficient approximate mappings using perfect hash functions, with configurable storage and accuracy trade-offs
pfc — Zero-copy, prefix-free data representations with algebraic types and succinct data structures
cbt — Computational basis transforms between domains
bloomy — Secure index based on Bloom filters (connecting to the encrypted search work)
packed_data — Compact data representations
accumux — Accumulator and multiplexer patterns

Numerical and algebraic computation:

limes — Composable calculus expressions: symbolic differentiation, numerical integration, algebraic composition
elementa — Pedagogical C++20 linear algebra library
gradator — Pedagogical C++20 automatic differentiation library
dual — Dual number arithmetic for automatic differentiation

Simulation:

barnes-hut — Barnes-Hut tree algorithm for N-body simulation

The Stepanov essay series (Seeing Structure First) explains the philosophy: an algorithm like power(x, n, op) computes x^n under any monoid operation—exponentiation, matrix power, string repetition, path composition. The C++ libraries embody this by treating data structures as instances of algebraic concepts, not as standalone implementations.

Boolean, Fuzzy, and Symbolic Computation

A cluster of Python projects (all on PyPI) exploring logic beyond true/false:

dotsuite — Boolean algebra operations and visualization
jaf — JSON algebra: treating JSON documents as algebraic objects with merge, diff, and patch operations
fuzzy-infer — Fuzzy logic inference engine
soft-circuit — Soft Boolean circuits where gates have continuous rather than discrete outputs
fuzzy-soft-circuit — Combining fuzzy logic with soft circuits
fuzzy-logic-search — Search systems that use fuzzy logic for relevance scoring
rerum — Term rewriting engine
xtk — Expression toolkit for symbolic manipulation
tree_rewriter — Tree-based term rewriting

This cluster connects to the encrypted search work (Boolean algebra over trapdoor sets) and to the AI work (soft circuits as differentiable logic). The thread: logic is algebraic structure, and generalizing from crisp to fuzzy to soft to encrypted is a matter of changing which algebraic laws you preserve.

The Unix Toolkit Constellation

Fifteen CLI tools that share a philosophy: SQLite-backed, CLI-first, composable, personal data sovereignty. They use a naming convention—most end in tk (toolkit) or k—and are designed to work together through Unix pipes and shared conventions.

Tool	Domain	Description
btk	Bookmarks	SQLite-backed bookmarks with hierarchical tags and a typed query DSL
ebk	Ebooks	E-book library management, metadata extraction, format conversion
mtk	Mail	Local email archival and search
ptk	Photos	Photo management with EXIF metadata and tagging
atk	Audio	Audio file management and metadata
ctk	Conversations	LLM conversation archival and search (blog post)
xtk	Expressions	Symbolic expression manipulation
jot	Notes	Journaling and note-taking
deets	Personal metadata	Identity and metadata management
repoindex	Repositories	Git repository database and query tool
crier	Social media	Cross-posting blog content to social platforms (12 stars)
chop	Images	Image manipulation
dapple	Terminal graphics	Rich terminal output and visualization
clerk	Workflows	Task and workflow management
chartfold	Health records	Personal health record management

The Philosophy

These tools share a conviction: your personal data should live in local SQLite databases, not in someone else’s cloud. Every tool stores its data in a format you can query with SQL, back up with cp, and inspect with sqlite3. They’re designed to compose through pipes:

# Find bookmarks tagged "research", get their URLs, search your ebooks for related content
btk query --tag research --format url | xargs -I{} ebk search --related {}

The blog series on digital legacy explains the motivation. When I was diagnosed with cancer, the question “what happens to my digital life?” became urgent. These tools are one answer: if your data lives in portable, open formats on hardware you control, it survives you.

AI and LLM Projects

The largest and fastest-growing cluster, spanning reasoning, search, and safety.

LLM Tools and Research

elasticsearch-lm (37 stars) — Language model integration with Elasticsearch. This bridges the encrypted search research with modern LLM applications.
mcts-reasoning — Monte Carlo Tree Search for structured reasoning. Uses tree search (connecting to AlgoTree) to improve LLM reasoning quality. (Blog post)
dreamlog — Logic programming with LLM integration and wake-sleep learning cycles
complex-network-rag — RAG (Retrieval-Augmented Generation) over complex document networks
ollama_data_tools (4 stars) — Utilities for working with Ollama models and data
llm-bayes — Bayesian reasoning with LLMs
synthdata — Synthetic data generation
agentum — Unified framework for sequential decision-making, from classical search to deep RL
itinero — LLM-powered web automation through composable strategies and Playwright
AutoPoiesi — Self-organizing systems inspired by autopoiesis theory

The Thread

These aren’t random AI projects. The encrypted search work asks: “How do you search without revealing what you know?” The LLM work asks the complement: “How do you reason with what you know?” Both are about the interface between information and computation. elasticsearch-lm literally bridges the two—it’s a language model that searches.

Digital Legacy Projects

Five projects exploring what happens to digital identity after death. This is the Long Echo series in code form.

longecho — The framework for durable personal data. Redundant storage, format migration, integrity verification.
longshade — Generate a conversable persona from personal data: conversations, writings, and email
posthumous — Automated actions triggered by death or incapacity. Dead man’s switches, scheduled messages, data release.
pagevault — Password-protect semi-private content on static sites like Hugo blogs
cryptoid — Client-side encrypted content for Hugo static sites with multi-user access control

These are nascent. They share a philosophy but don’t yet share infrastructure. The vision: a unified system where your personal data (managed by the *tk tools) is stored durably (longecho), distilled into a conversable persona (longshade), protected on your sites (pagevault, cryptoid), and handled according to your wishes after death (posthumous).

The essays on digital legacy provide the philosophical backdrop. The tools provide the engineering.

Literature

Three novels, each exploring different aspects of consciousness and intelligence:

echoes-of-the-sublime — Philosophical horror, approximately 103,000 words. What happens when consciousness encounters something genuinely beyond its capacity to process? (Blog post)
the-policy — AI alignment science fiction. A near-future story about the consequences of a specific alignment strategy. (Blog series)
call-of-asheron — Epic fantasy. Emergence, self-organization, and what happens when a world’s underlying rules become visible to its inhabitants.

The novels aren’t decoration. They’re explorations of the same themes that drive the code—consciousness, information, what persists and what doesn’t—in a medium that allows for ambiguity and nuance that research papers can’t.

There’s also stories, a collection of shorter fiction.

The Erdos Problems Database

One project worth mentioning separately: erdosproblems (470 stars) is a community database for the problems listed on erdosproblems.com—Thomas Bloom’s curation of Paul Erdos’s open problems in combinatorics, number theory, and graph theory. My contribution is to the structured data layer: making the problems machine-readable, searchable, and cross-referenced. It reflects the same instinct that drives much of this ecosystem: make knowledge structured and open.

Programming Languages and Interpreters

A small cluster of projects exploring language design:

jsl — JSON-based scripting language
jsonl-algebra — Algebraic operations on JSONL streams
nfa-tools — Non-deterministic finite automaton manipulation
dagshell — Shell with DAG-based execution
tex2any / texflow — LaTeX transformation tools

These connect to the symbolic computation cluster (rerum, xtk, tree_rewriter) through their shared concern with formal language manipulation.

Education and Essay Series

Four ongoing essay series, each associated with a repository of supporting material:

stepanov — Generic programming in the style of Stepanov. Eleven essays on finding algebraic structure in algorithms.
sicp — Abstraction and composition, inspired by Structure and Interpretation of Computer Programs.
the-learning-problem — Machine learning from an information-theoretic perspective. Sequential prediction, Solomonoff induction, compression as learning.
the-long-echo — Digital legacy, personal data sovereignty, and what persists.

Space Simulation

Two projects that stand somewhat apart:

space-sandbox-sim — N-body physics sandbox
star-system-sim — Star system generation and simulation

These connect to the C++ algorithmic work (barnes-hut tree algorithm for N-body simulation) and to the generic programming philosophy (physics simulation as algebraic structure).

Infrastructure and Meta-tools

The projects that support everything else:

metafunctor — This Hugo blog. The central publication venue for all the work above.
queelius.github.io — GitHub Pages deployment
queelius.r-universe.dev — R-universe configuration for the R package ecosystem
src2md — Convert source code to markdown for documentation
texwatch — Watch LaTeX files and rebuild on change
sandrun — Sandboxed command execution
zeroipc — Zero-copy IPC mechanisms (multi-language)

Gap Analysis: What’s Missing

Looking at 120+ projects as a whole, several gaps become visible:

No Unified Documentation Portal

Sixty-two repos have GitHub Pages sites, but there’s no cross-project index. If you want to understand how algebraic.mle connects to likelihood.model connects to likelihood.model.series.md, you have to read three separate documentation sites and mentally stitch them together. A single documentation portal with cross-linked API docs and a dependency graph would make the ecosystem navigable.

Missing CI Infrastructure

Many beta-stage repositories lack continuous integration. The R packages have CI through r-universe, but the Python and C++ projects are inconsistent. A standardized CI template across the ecosystem would catch breakage earlier.

No Rich Python Package Index

Over 20 packages are on PyPI under the queelius account, but the PyPI profile is just a flat list of names. There’s no equivalent to r-universe—no descriptions, no dependency graph, no grouping by theme. A curated index page (or a Python equivalent of r-universe) would make the Python ecosystem as navigable as the R one.

No Cross-Project Dependency Graph

It’s hard to see which projects build on which. The R packages have an implicit dependency tree, but it’s not visualized. The *tk tools share conventions but not code. A visual dependency map would help contributors and users understand the architecture.

C++ Libraries Lack a Unifying Build Mechanism

The C++ projects don’t share a package manager configuration (Conan, vcpkg, or similar). Each library is standalone. For users who want to use multiple libraries together—say, algebraic_hashing with sparse_spatial_hash—there’s no easy integration path.

Scattered Fuzzy Logic

Four separate fuzzy logic projects (fuzzy-infer, fuzzy-logic-search, fuzzy-soft-circuit, soft-circuit) share concepts but not code. They could be unified into a single, layered fuzzy logic library with clear separation between the inference engine, the circuit model, and the search application.

The *tk Tools Need a Meta-Installer

The Unix toolkit tools share a philosophy and naming convention, but there’s no umbrella project, shared library, or meta-installer. Common patterns (SQLite storage, CLI argument parsing, query DSL) are reimplemented in each tool. A shared foundation library would reduce duplication and make it easier to build new *tk tools.

No Benchmarking Suite for Probabilistic Data Structures

The hashing and data structure projects (maph, algebraic_hashing, sparse_spatial_hash, pfc) lack comparative benchmarks. A shared benchmark suite would make performance claims credible and help users choose between implementations.

Digital Legacy Projects Need Integration

The longecho/longshade/posthumous projects articulate a compelling vision but exist as separate, early-stage prototypes. The integration story—how they work together, and how they connect to the *tk tools—hasn’t been built yet.

Missing End-to-End Tutorials

The R packages have individual documentation, but there’s no tutorial showing how to use them together: generate synthetic masked data, fit models, compare via AIC, bootstrap confidence intervals, and visualize results—all using the composable architecture.

Where to Start

If you’ve read this far and want to explore, here are entry points by interest:

If you’re a statistician or reliability engineer: Start with algebraic.mle (on CRAN), then read the likelihood.model.series.md post. The composable likelihood architecture is the most mature part of the ecosystem.

If you’re a C++ programmer interested in generic programming: Start with the Stepanov series, especially Seeing Structure First. Then look at the C++ libraries (algebraic_hashing, maph, pfc) for worked examples of the philosophy.

If you work with trees or graphs in Python: AlgoTree and AlgoGraph are on PyPI and treat data structures as composable algebraic objects.

If you care about personal data sovereignty: Start with btk (bookmarks) or ctk (conversations)—they’re the most polished *tk tools. Then read the Long Echo essays for the philosophical motivation.

If you’re interested in AI/LLM tools: elasticsearch-lm bridges traditional search with language models. mcts-reasoning applies tree search to LLM reasoning.

If you’re a mathematician: The oblivious computing papers apply algebra to cryptography. The information theory posts connect Solomonoff induction to practical prediction.

If you want to read fiction: echoes-of-the-sublime is the longest and most developed. the-policy is the most timely—AI alignment as lived experience.

The Shape of the Whole

If I step back and look at everything together, I see a single question asked in many registers:

How do you build reliable knowledge from noisy, incomplete, and partial observations?

In encrypted search, the noise is intentional—it is the price of privacy. In reliability engineering, the noise is censoring and masking—the data you wish you had. In fuzzy logic, the noise is inherent in vague predicates. In LLM reasoning, the noise is the stochasticity of language models. In digital legacy, the noise is time itself—bit rot, platform decay, forgetting.

The answer, in every case, is the same: algebraic structure. Find the right abstractions, make them compose, and the noise becomes manageable.

What does 120+ projects look like, all at once? It looks like a person thinking out loud for over a decade. The early projects are tentative—data structures, small algorithms, conference papers. The middle period is the thesis work: rigorous, focused, building toward specific results. The recent projects are more ambitious and more personal—novels, digital legacy tools, an essay about watching intelligence leave the body.

The connecting thread isn’t a technology or a domain. It’s a question: what structures persist? Algebraic structures persist across implementations. Mathematical results persist across paradigms. Well-designed software interfaces persist across refactoring. Stories persist across readers. And if you’re careful about formats and infrastructure, personal data can persist across a lifetime—and beyond.

That’s the ecosystem. It’s incomplete, unevenly documented, and will probably never be finished. But the map is here now, and the territory is open.

Resources: