Ctk | metafunctor

Below you will find pages that utilize the taxonomy term “Ctk”

Long Echo Comes Alive: From Philosophy to Orchestration

January 20, 2026

A year ago, I wrote about Long Echo as a philosophy for preserving AI conversations across decades. The key insight was graceful degradation: design archives that work progressively even as technology disappears.

That philosophy has become a tool.

From Philosophy to Tool

The original Long Echo was intentionally not code. It was a set of principles documented in CTK’s repository. The hard problems of conversation parsing, storage, and search were already solved by toolkits like CTK, BTK, and EBK.

What was missing was the unification layer. Each toolkit exports its own ECHO-compliant archive, but combining them into a single browsable experience required manual work. That’s what longecho now handles.

What longecho Does Now

longecho is a CLI tool with five capabilities:

longecho check ~/my-data/       # Validate ECHO compliance
longecho discover ~/            # Find ECHO sources
longecho search ~/ "query"      # Search README descriptions
longecho build ~/my-archive/    # Generate static site
longecho serve ~/my-archive/    # Preview locally via HTTP

The check, discover, and search commands existed in the original specification. What’s new is build and serve, the orchestration layer.

Building a Unified Site

The build command takes a hierarchical archive and generates a static site:

longecho build ~/my-archive/

This produces a site/ directory with:

An index page linking to all sub-archives
Navigation between sources
Automatic linking to existing sub-site builds

If a sub-archive already has its own site/ directory (like CTK’s exports), longecho links to it. Use --bundle to copy everything into a portable, self-contained site.

Live Preview

The serve command provides local HTTP preview:

longecho serve ~/my-archive/ --port 8000

It builds the site if needed, then serves it for browser viewing.

The Manifest

ECHO compliance requires only a README. But for machine-readable metadata, longecho supports an optional manifest:

version: "1.0"
name: "Alex's Data Archive"
description: "Personal data archive"
sources:
  - path: "conversations/"
    order: 1
  - path: "bookmarks/"
    order: 2
  - path: "ebooks/"
    order: 3

The manifest enables:

Explicit ordering of sources in generated sites
Selective inclusion via the browsable flag
Override names for cleaner presentation
Icon hints for UI presentation

Without a manifest, longecho auto-discovers sub-archives by looking for directories with README files. The manifest provides explicit control when you need it.

Long Echo: Photos and Mail

January 19, 2026

The Long Echo toolkit now covers conversations, bookmarks, and ebooks. But two of the most emotionally significant categories of personal data remain: photos and mail.

Both share a troubling pattern: scattered across devices and cloud services, organized by date rather than meaning, vulnerable to platform disappearance. They deserve better.

The Expanding Ecosystem

Tool	Domain	Status
ctk	AI Conversations	stable
btk	Bookmarks & Media	stable
ebk	eBooks	stable
repoindex	Git Repositories	stable
ptk	Photos	incubating
mtk	Mail	incubating

The orchestration layer, longecho, ties these together into a unified personal archive.

PTK: Photo Toolkit

Photos are the most emotionally valuable digital artifacts most people have. They’re also among the worst-managed.

The Problem

Your photo library is probably:

Scattered: Phone, old phones, cloud services, camera imports, messaging app saves
Organized by date: Not by who’s in them, where they were taken, or what they mean
Cloud-dependent: Google Photos, iCloud, Amazon Photos. What happens when you switch?
Unsearchable by content: “Find photos of mom at the beach” isn’t possible
Missing context: Only you know why that blurry photo matters

The Vision

ptk provides:

Unified import from any source:

ptk import ~/Pictures/
ptk import ~/phone-backup/DCIM/
ptk import google-takeout.zip --source google-photos
ptk import icloud-export/ --source icloud

Intelligent organization by multiple dimensions:

ptk shell
ptk:/$ cd /people/mom
ptk:/people/mom$ ls
2019/  2020/  2021/  2022/  2023/  2024/

ptk:/$ cd /locations/beach
ptk:/$ cd /events/christmas-2023
ptk:/$ cd /years/2020/months/march

AI-powered features:

# Face detection and clustering
ptk faces detect --all
ptk faces cluster
ptk faces label cluster-7 "Mom"
ptk faces find "Mom"

# Scene captioning
ptk caption --all --model ollama/llava
ptk search "sunset over water"

# Semantic search
ptk ask "photos from our trip to Colorado"

Preservation guarantees:

# Verify nothing is corrupted
ptk verify --checksums

# Export to durable formats
ptk export ~/archive/photos/ --format longecho
ptk export photos.html --format html-gallery

# Original files always preserved
ptk originals list
ptk originals verify

Why SQLite?

Like the other Long Echo tools, ptk uses SQLite for metadata:

# Works even if ptk disappears
sqlite3 photos.db "
  SELECT path, caption, taken_at
  FROM photos
  WHERE caption LIKE '%birthday%'
  ORDER BY taken_at
"

The database stores metadata, face embeddings, captions, and organization. The actual photo files stay in place or are copied to a managed library, your choice.

The Long Echo Toolkit

December 16, 2025

Earlier this year I wrote about Long Echo, a philosophy for preserving AI conversations in ways that stay accessible across decades. The core idea was graceful degradation: systems that fail progressively, not catastrophically.

Since then I’ve built out three tools that apply this thinking to all personal digital content, not just conversations. Bookmarks, books, and AI chats. Together they form a system for managing the stuff you actually think with.

The Toolkit

Tool	Domain	Install
CTK	AI Conversations	`pip install conversation-tk`
BTK	Bookmarks & Media	`pip install bookmark-tk`
EBK	eBooks & Documents	`pip install ebk`

All three share a common architecture, but each is specialized for its domain.

Shared Architecture

SQLite-First Storage

Every tool uses local SQLite databases you own. No cloud dependency. Queryable with standard tools even if the CLI disappears tomorrow:

# Works even if the tools are gone
sqlite3 conversations.db "SELECT title FROM conversations WHERE title LIKE '%python%'"
sqlite3 bookmarks.db "SELECT url, title FROM bookmarks WHERE stars = 1"
sqlite3 library.db "SELECT title, author FROM books WHERE favorite = 1"

This is the whole point. The database is the artifact, not the tool.

Interactive Shells with Virtual Filesystems

Navigate your data like a Unix filesystem:

$ btk shell
btk:/$ cd tags/programming/python
btk:/tags/programming/python$ ls
3298  4095  5124  (bookmark IDs)
btk:/tags/programming/python$ cat 4095/title
Advanced Python Techniques

$ ebk shell
ebk:/$ cd authors/Knuth
ebk:/authors/Knuth$ ls
The Art of Computer Programming Vol 1
The Art of Computer Programming Vol 2

Reading Queues

Track what you’re reading, watching, or working through:

# Bookmarks
btk queue add 42 --priority high
btk queue next
btk queue progress 42 --percent 75
btk queue estimate-times  # Auto-estimate from content length

# Books
ebk queue add "Gödel, Escher, Bach"
ebk queue next
ebk queue list

LLM Integration

All three integrate with LLMs for tagging, summarization, and search:

# Auto-tag using content analysis
btk content auto-tag --all
ctk auto-tag --model ollama/llama3
ebk enrich 42  # Enhance metadata with LLM

# Natural language queries
ctk say "summarize my conversations about Rust"
btk ask "find articles about distributed systems"
ebk similar "Gödel, Escher, Bach"  # Semantic similarity

Network Analysis

Find relationships in your data:

# CTK: Conversation networks
ctk net embeddings --all
ctk net similar 42
ctk net clusters
ctk net central  # Most connected conversations
ctk net outliers  # Isolated conversations

# BTK: Bookmark graphs
btk graph build
btk graph analyze

Web Servers

Browse your archives in a web UI:

Everything is a File: Virtual Filesystems for CLI Data Tools

October 20, 2025

I had a bookmark manager. Then an ebook library manager. Then a chat history manager. Each started with the standard CRUD CLI:

btk add https://example.com --tags python,tutorial
btk list --tag python
btk search "async"
btk delete 1234

ebk import book.pdf --author "Knuth"
ebk list --author Knuth
ebk search "algorithms"

This works fine until you have 10,000+ bookmarks organized with hierarchical tags like programming/python/async, research/ml/transformers, work/clients/acme. Your ebook library has similar structure. Your exported chat conversations from Claude, ChatGPT, and Copilot are piling up.

Traditional CRUD commands become unwieldy:

btk list --tag programming/python/async/io --format json | jq '.[].title'
ebk list --category "Computer Science/Algorithms/Graph Theory" --limit 50
ctk search "machine learning" --source ChatGPT --date-from 2024-01-01

Each command requires precise arguments. Each tool has different flag conventions. You can’t navigate your data. You can only query it. And queries require knowing exactly what you’re looking for.

The insight: everything is a file

When I have thousands of source files organized in directories, I don’t run:

list-files --path /src/components/auth --extension .tsx

I run:

cd src/components/auth
ls *.tsx

The difference matters. With a filesystem, I can navigate incrementally (cd from general to specific), explore (ls to see what’s there), compose (cat file | grep pattern | wc -l), and use familiar tools (find, grep, xargs, pipes, redirection).

What if my bookmarks, ebooks, and chat histories were filesystems?

The pattern

Over the past year, I built six Python tools that all follow the same architecture:

Tool	Domain	VFS Root Structure
btk	Bookmarks	`/bookmarks/`, `/tags/`, `/recent/`, `/domains/`, `/unread/`, `/popular/`
ebk	Ebook library	`/books/`, `/authors/`, `/series/`, `/subjects/`, `/recent/`, `/unread/`
ctk	Chat conversations	`/conversations/`, `/sources/`, `/topics/`, `/starred/`, `/recent/`
ghops	Git repositories	`/repos/`, `/languages/`, `/topics/`, `/stars/`, `/recent/`
infinigram	N-gram models	`/datasets/`, `/models/`, `/corpora/`
AlgoTree	Tree structures	`/nodes/`, `/paths/`, `/subtrees/`

Each tool provides:

A stateless CLI for scripting: btk bookmark add URL, ebk import book.pdf
An interactive shell with a virtual filesystem: btk shell, ebk shell, ctk chat
POSIX-like commands: cd, ls, pwd, cat, mv, cp, rm, find, grep
Unix pipeline support: most commands output JSONL by default for piping

The interesting part is the shell.

Navigating 10,000 bookmarks

Live recording captured with asciinema. You can pause, copy text, and replay. The entire recording is 78KB of text.

CTK: Conversation Toolkit

October 9, 2025

CTK manages AI conversations across platforms. Import from ChatGPT, Claude, Copilot, Gemini. Store locally in SQLite. Search, tag, export. Keep everything.

The Problem

If you use multiple AI assistants, your conversations are scattered across incompatible platforms, unsearchable, and dependent on companies that may not exist in 20 years. ChatGPT lives in OpenAI’s web app. Claude is siloed in Anthropic’s interface. Copilot chat history is buried in VS Code storage.

You can’t search across them. You can’t back them up in a unified format. You can’t own them.

The Key Insight: Conversations Are Trees

Most tools treat conversations as linear sequences. They’re not. ChatGPT’s “regenerate” feature creates branches. Claude supports conversation forking. Even a simple “let me try that again” is a tree operation.

User: "Write a poem"
  ├── Assistant (v1): "Roses are red..."
  └── Assistant (v2): "In fields of gold..."  [regenerated]
      └── User: "Make it longer"
          └── Assistant: "In fields of gold, where sunshine..."

CTK stores all conversations as trees. Linear chats are single-path trees. Branching conversations preserve every path. This means you never lose a regeneration, and you can export any path you want.

What It Does

# Import from any platform
ctk import chatgpt_export.json --db my_chats.db
ctk import claude_export.json --db my_chats.db --format anthropic
ctk import ~/.vscode/workspaceStorage --db my_chats.db --format copilot

# Search across everything
ctk search "python async" --db my_chats.db

# Natural language queries via LLM tool calling
ctk say "find conversations about distributed systems" --db my_chats.db

# Interactive TUI for browsing and chatting
ctk chat --db my_chats.db

# Export for fine-tuning, archival, or publishing
ctk export training.jsonl --db my_chats.db --format jsonl
ctk export archive.html --db my_chats.db --format html5
ctk export archive/ --db my_chats.db --format markdown

Plugin Architecture

Adding a new provider is one file. Implement ImporterPlugin, drop it in the integrations folder, done. Auto-discovered at runtime. No registry, no config.

Currently supported: OpenAI/ChatGPT (full tree), Anthropic/Claude (full tree), GitHub Copilot, Google Gemini, generic JSONL, coding agents (Cursor, Windsurf).

Privacy

100% local. No telemetry. Optional sanitization strips API keys, passwords, and personal identifiers before export.

ctk export clean_export.jsonl --db chats.db --format jsonl --sanitize

HTML5 Export

The HTML5 exporter produces a self-contained file with embedded search, tree visualization, and dark mode. No server, no internet, no dependencies. The file works offline in any browser, including continuing conversations with a local LLM directly in the exported HTML.

Long Echo: Designing for Digital Resilience Across Decades

January 6, 2025

Update (January 2026): Since this post was written, longecho has evolved from specification to implementation. See Long Echo Comes Alive for the current state including build, serve, and manifest features.

Not Resurrection. Not Immortality.

Just love that still responds.

That’s the idea behind Long Echo. It’s a project about preserving conversations with AI assistants so they stay accessible and meaningful across decades. Not digital ghosts that autonomously post to social media. Not trying to resurrect anyone. Just making sure the knowledge and care captured in these conversations can still be found, searched, and used when the original software is gone.

The Problem

We’re having important conversations with AI assistants:

Teaching moments with students
Advice we’d give our children
Technical problems we’ve solved
Creative work we don’t want to lose
Personal growth tracked over years

But these conversations are trapped in proprietary formats, scattered across platforms (ChatGPT, Claude, Gemini, Copilot), and dependent on companies that may not exist in 50 years.

What happens when you want to find that debugging advice from 2024? What if your children want to search your conversations after you’re gone? What if the company shuts down their API?

The Philosophy: Graceful Degradation

The core idea is graceful degradation, designing systems that fail progressively, not catastrophically:

Level 1: Full functionality  → CTK with semantic search, RAG, beautiful TUI
Level 2: Database queries    → SQLite direct queries (CTK gone, SQLite remains)
Level 3: File search         → grep through JSONL files (just text tools)
Level 4: Human reading       → Markdown, HTML (readable without any tools)
Level 5: Ultimate fallback   → Plain text in notepad

Each level still works even if everything above it is gone.

The Discovery: CTK Already Solved This

I started building Long Echo as a separate system. I designed multi-format importers, search with fallbacks, memory extraction pipelines. Complex architecture diagrams. Deployment strategies. The whole thing.

Then I realized that CTK (Conversation Toolkit), which I had built earlier, already solved all the hard problems.

CTK already provides:

Import from all platforms (unified API)
Conversation trees (handles branching, regenerations)
SQLite storage (local, queryable, persistent)
Multiple export formats (JSONL, Markdown, HTML, JSON)
Full-text search + LLM-powered queries
Complex network RAG (coming soon)
Terminal UI

Everything I was designing was already built. By me. Earlier.

This wasn’t failure. I’d already built the foundation without realizing it. The hard problems (conversation parsing, unified representation, search, storage) were handled. What Long Echo needed wasn’t more code. It needed a philosophy.

Discovering ChatGPT: Reconnecting with AI Research

December 8, 2022

I finally noticed ChatGPT this week. Everyone’s been talking about it, but I was buried in cancer treatment, chemo recovery, surgery prep, and thesis work on Weibull distributions.

When I finally tried it, my reaction wasn’t surprise at the technology itself.

It was: “This makes sense. The pieces were all there.”

Why I Missed It

GPT-3 came out in 2020. I was dealing with:

Stage 3 cancer diagnosis
Chemotherapy
Mathematical statistics coursework
Thesis research on masked failure data
Surgery and recovery

I had no attention left for tracking ML developments. The world moved on. I was focused on survival.

The Theoretical Foundation

I’ve been interested in Marcus Hutter and Ray Solomonoff’s work for years.

Solomonoff induction: optimal prediction is compression. Intelligence is sequence prediction. The smallest program that generates your observations is the best predictor of what comes next.

Hutter’s AIXI: intelligence = optimal compression-based prediction with resource bounds.

During my CS master’s, I proposed working on sequence prediction as a thesis topic, inspired by Solomonoff. The professor wasn’t interested. I ended up doing encrypted search instead.

But the intuition stayed: prediction ~ compression ~ intelligence.

The Bitter Lesson

Rich Sutton’s “The Bitter Lesson” laid it out: scaling compute and data beats clever algorithms.

The lesson from 70 years of AI research: general methods that use computation win. Hand-crafted features lose. Search and learning scale. Everything else doesn’t.

I read that paper and found it compelling. But there’s a difference between understanding theory and watching it play out at scale. OpenAI was actually doing the scaling while I was working on other problems.

ImageNet Should Have Been the Signal

In retrospect, ImageNet being solved by deep neural networks in 2012 was the canary. A simple architecture (CNNs), massive data, lots of compute, and you get superhuman image classification.

That was the proof: scale works.

GPT is the same pattern:

Simple architecture (transformers)
Massive data (internet-scale text)
Enormous compute (thousands of GPUs)

Result: something that looks disturbingly intelligent.

Connecting the Dots

The theoretical framework was there:

Solomonoff: intelligence is compression
Hutter: optimal prediction with bounded resources
Sutton: scaling beats cleverness

The empirical evidence accumulated: