The *-memex Family

Why the *-memex family exists and the six commitments that hold it together.

June 18, 2026

In 1945 Vannevar Bush described the memex: a device in which a person keeps their books, records, and correspondence, mechanized so it can be consulted with speed and flexibility, “an enlarged intimate supplement to” memory. The part he cared about was not the storage. It was trails: associative paths a person builds through their own material, so that a line of thought, once traced, can be walked again and handed to someone else.

Eighty years on, the storage is free and the trails still barely exist. We have search, which finds a thing if you can already describe it, and feeds, which show you what someone else chose for you. Neither remembers for you, on your terms, in a way you can retrace years later.

The *-memex family is an attempt to build the thing Bush actually described: for one person, on hardware they control, in formats that outlive the tools, and reachable by both that person and the AI agents that now read alongside them.

What it is, and what it is not

It is not a product. There is nothing to sign up for and no server in the middle.

It is not academic research. There are no papers, no benchmarks, no novelty claim to defend. It is a design ethos expressed as working software. The tools are the argument: each one is a wager that personal memory should be structured a particular way, and the wager either pays off in daily use or it does not.

It is a family, not one program. Each archive covers a single domain, is versioned on its own clock, and is replaceable without touching the others. A federation layer sits above them and holds the cross-archive memory: trails, one semantic index over everything, resolution of references, and a graph that is still mostly on paper.

The six commitments

These are the hard core. Everything below them is implementation detail that is allowed to change.

1. The corpus is one person’s, and that person owns it. Every archive is a personal corpus of one domain, stored as a SQLite file on a disk you control. Not a SaaS account, not a row in someone else’s multi-tenant table, not a thing that vanishes when a company pivots. You can open it with the sqlite3 binary today, and because SQLite has a published format and a forever-promise, the same way in 2050. Ownership here is not a license term. It is a property of where the bytes sit and what reads them.

2. Narrow archives, smart federation. Each archive does one job thinly: import, store, index, expose. It is not clever about other domains and not clever about intelligence. The cross-archive work lives in the federation layer instead. Thin archives are easy to reason about and easy to throw away and rewrite; the smart layer can swap models, add a graph, or change how trails work without forcing edits into seven codebases. Trails are the federation’s first object, for the reason Bush put them at the center: a search result is something you found once, a trail is something you can walk again.

3. Build for the thirty-year test. Some data is authored (marginalia, notes, tags, trails, the annotations made by hand) and cannot be reconstructed. Some is derived (indexes, imported records, embeddings) and costs only CPU to rebuild. The design keeps these apart so a rebuild never destroys authored work, and the durability rules exist to protect the authored tier: SQLite for storage, arkiv JSONL for interchange, .zip and .gz rather than newer compressors, and HTML exports that vendor their dependencies and run offline. An archive that only works while you are around to babysit it is a diary you have to rewrite every morning. The point is the opposite: that it keeps working when you are busy, when you are sick, and after you are gone. That last clause is the long echo.

4. Expose primitives, not oracles. The reader is a human, or that human’s AI agent, or both. Each archive exposes primitives: read-only SQL, full-text search, paginated retrieval, reference resolution, a schema you can ask for. It does not ship an “ask my email a question” tool that calls a model inside itself, because when the caller is already an agent a nested model is the weakest link: it caps quality, adds a round trip, and hides the raw records the outer agent could reason over directly. The archive is a library with an honest card catalog, not an oracle that answers in its own voice.

5. References are strings. A reference to a record in another archive is a URI, <archive>://<kind>/<id>, with a #fragment for a position inside it. There are no foreign keys and no central database that has to be up for a reference to mean something. A plain string composes through a note, the clipboard, a trail step, a chat message. Durable content-derived IDs and soft delete keep it resolvable across re-imports, renames, and deletions. It is the web’s addressing model pointed at your own memory.

6. Anything you publish is safe by default. Every exported artifact is treated as potentially public: no hardcoded endpoint, no authentication on the author’s behalf, no remote fallback for code that matters. You should be able to hand someone a single file and have it be exactly as private as the data you chose to include, and no more.

The family

Seven archives, each one domain, plus the federation layer that remembers across them.

ArchiveDomain
llm-memexAI chat conversations (ChatGPT, Claude, Gemini, Claude Code)
mail-memexPersonal email (mbox, eml, IMAP)
bookmark-memexBookmarks and web clippings
photo-memexPhoto library
book-memexeBook library plus reading marginalia and highlights
health-memexMedical records across EHR systems
hugo-memexSite content
memex (federation)Trails, the cross-archive semantic index, reference resolution, graph

Every archive satisfies the same contract: a SQLite database, an MCP server an agent can query, a thin import/export CLI, arkiv export, durable record IDs, and free-form marginalia that survive their target being deleted.

Where it sits

The *-memex family is the memory limb of a wider set of commitments about durable personal data. longecho is the durability philosophy and tooling these archives inherit. arkiv is the interchange format they all export to: JSONL in, SQL out. Read together they are one position stated several ways: that a person’s records are theirs, should be legible to themselves and their agents, and should still be legible when the tools, the companies, and eventually the person are gone.

Topics

#memex #personal data #data sovereignty #durability #longecho #mcp #llm agents #vannevar bush