Skip to main content

CTK: Conversation Toolkit for Managing AI Conversations Across All Platforms

CTK (Conversation Toolkit) is a powerful, plugin-based system for managing AI conversations from multiple providers. Import, store, search, and export your conversations in a unified tree format while preserving provider-specific details. Built to solve the fragmentation of AI conversations across ChatGPT, Claude, Copilot, and other platforms.

The Fragmentation Problem

If you use multiple AI assistants, you’ve experienced this pain:

  • ChatGPT conversations live in OpenAI’s web app
  • Claude conversations are siloed in Anthropic’s interface
  • GitHub Copilot chat history is buried in VS Code storage
  • Local LLMs (Ollama, etc.) have no standard export format

Result: Your valuable conversations are scattered across incompatible platforms, unsearchable, and at risk of being lost if a provider shuts down or changes their export format.

CTK’s Solution: Universal Tree Format

CTK provides a universal tree representation for all conversations:

User: "What is Python?"
  └── Assistant: "Python is a programming language..."
      └── User: "How do I install it?"
          └── Assistant: "You can install Python by..."

Key insight: Linear chats are just single-path trees. Branching conversations (like ChatGPT’s “regenerate” feature) are multi-path trees:

User: "Write a poem"
  ├── Assistant (v1): "Roses are red..."
  └── Assistant (v2): "In fields of gold..."  [regenerated]
      └── User: "Make it longer"
          └── Assistant: "In fields of gold, where sunshine..."

This tree representation preserves all branching structure from any provider while providing a uniform interface for search, export, and analysis.

Quick Start

# Setup
git clone https://github.com/queelius/ctk.git
cd ctk
make setup
source .venv/bin/activate

# Import from multiple providers
ctk import chatgpt_export.json --db my_chats.db
ctk import claude_export.json --db my_chats.db --format anthropic
ctk import ~/.vscode/workspaceStorage --db my_chats.db --format copilot

# Search with beautiful tables
ctk search "python async" --db my_chats.db --limit 10

# Natural language queries powered by LLM
ctk ask "show me conversations about machine learning" --db my_chats.db

# Interactive TUI
ctk chat --db my_chats.db

# Export for fine-tuning
ctk export training.jsonl --db my_chats.db --format jsonl

Core Features

🌳 Universal Tree Format

All conversations stored as trees—linear chats are single-path trees, branching conversations preserve all paths.

Benefits:

  • Preserves all regenerations and variants from ChatGPT
  • Supports conversation forking in Claude
  • Captures branching from any provider
  • Enables path selection during export (longest, first, latest)

🔌 Plugin Architecture

CTK auto-discovers importers and exporters. Adding support for a new provider is trivial:

# File: ctk/integrations/importers/my_format.py
from ctk.core.plugin import ImporterPlugin
from ctk.core.models import ConversationTree, Message, MessageContent, MessageRole

class MyFormatImporter(ImporterPlugin):
    name = "my_format"
    description = "Import from My Custom Format"
    version = "1.0.0"

    def validate(self, data):
        """Check if data is your format"""
        return "my_format_marker" in str(data)

    def import_data(self, data, **kwargs):
        """Convert data to ConversationTree objects"""
        tree = ConversationTree(id="conv_1", title="Imported")
        msg = Message(
            role=MessageRole.USER,
            content=MessageContent(text="Hello")
        )
        tree.add_message(msg)
        return [tree]

Place the file in the integrations folder—done! The plugin is automatically discovered at runtime.

💾 SQLite Backend

Fast, searchable local database with proper indexing:

Schema:

  • conversations - Metadata, title, timestamps, source, model
  • messages - Content, role, parent/child relationships
  • tags - Searchable tags per conversation
  • paths - Cached conversation paths for fast retrieval

Performance:

  • Full-text search across thousands of conversations (instant)
  • Indexed queries for filtering by source, model, date, tags
  • Efficient tree traversal with cached paths

🔒 Privacy First

100% local:

  • No data leaves your machine
  • No analytics or telemetry
  • No cloud dependencies

Optional secret masking:

# Remove secrets before sharing
ctk export clean_export.jsonl --db chats.db --format jsonl --sanitize

# Removes:
# - API keys (OpenAI, Anthropic, AWS, etc.)
# - Passwords and tokens
# - SSH keys
# - Database URLs
# - Credit card numbers

Custom sanitization rules:

from ctk.core.sanitizer import Sanitizer, SanitizationRule
import re

sanitizer = Sanitizer(enabled=True)

# Company-specific patterns
sanitizer.add_rule(SanitizationRule(
    name="internal_urls",
    pattern=re.compile(r'https://internal\.company\.com/[^\s]+'),
    replacement="[INTERNAL_URL]"
))

Search & Discovery

# Search with Rich table output
ctk search "machine learning" --db chats.db

# Advanced filtering
ctk search "python" --db chats.db --source ChatGPT --model GPT-4
ctk search "async" --db chats.db --tags "code,tutorial" --limit 20

# Date ranges
ctk search "AI" --db chats.db --date-from 2024-01-01 --date-to 2024-12-31

Output: Beautiful Rich tables with color-coded sources, models, and message counts.

Natural Language Queries

The killer feature: Ask questions in plain English using LLM-powered interpretation:

ctk ask "show me starred conversations" --db chats.db
ctk ask "find discussions about async python" --db chats.db
ctk ask "conversations from last week about AI" --db chats.db

How it works: The LLM interprets your natural language query and translates it into the appropriate database operations (filter by date, search text, check tags, etc.).

Smart Tagging

Three ways to organize conversations:

  1. Auto-tags by provider and model: ChatGPT, GPT-4, Claude, Sonnet-3.5
  2. Manual tags: ctk import --tags "work,2024"
  3. LLM auto-tagging: Analyzes conversation content and suggests relevant tags

Interactive TUI

Launch the terminal UI for visual conversation management:

ctk chat --db chats.db

TUI Features

Navigation & Browsing:

  • Browse conversations with Rich table view
  • Emoji flags for status: ⭐ (starred) 📌 (pinned) 📦 (archived)
  • Quick search and natural language queries
  • Tree view for branching conversations
  • Path navigation in multi-branch trees

Conversation Management:

  • Create, rename, delete conversations
  • Star, pin, archive operations in real-time
  • Auto-tagging with LLM
  • Export to various formats from within TUI

Live Chat:

  • Chat with any LLM provider (Ollama, OpenAI, Anthropic)
  • Model Context Protocol (MCP) tool support
  • Fork conversations to explore alternatives
  • Edit and regenerate messages
  • Switch between conversation paths

TUI Commands

# Navigation
/browse              # Browse conversations table
/show <id>           # Show conversation
/tree <id>           # View tree structure
/paths <id>          # List all paths

# Search & Query
/search <query>      # Full-text search
/ask <query>         # Natural language query (LLM-powered)

# Organization
/star <id>           # Star conversation
/pin <id>            # Pin conversation
/archive <id>        # Archive conversation
/title <id> <title>  # Rename conversation

# Chat Operations
/fork                # Fork current conversation
/regenerate          # Regenerate last message
/edit <msg_id>       # Edit a message
/model <name>        # Switch LLM model

# Export & Tools
/export <format>     # Export current conversation
/tag                 # Auto-tag with LLM
/help                # Show all commands
/quit                # Exit TUI

Supported Providers

Importers

ProviderFormatBranch SupportNotes
OpenAI (ChatGPT)openai✅ Full treePreserves all regenerations
Anthropic (Claude)anthropic✅ Full treeSupports conversation forking
GitHub Copilotcopilot❌ LinearAuto-finds VS Code storage
Google Geminigemini✅ PartialBard conversations
Generic JSONLjsonl❌ LinearFor local LLMs (Ollama, LM Studio)
Coding Agentscoding_agent❌ LinearCursor, Windsurf, etc.

Exporters

FormatDescriptionUse Case
JSONLOne conversation per lineFine-tuning datasets
JSONNative CTK formatBackup, transfer between databases
MarkdownHuman-readable with tree visualizationDocumentation, sharing
HTML5Interactive browsing with searchWeb publishing, archival

Import Examples

ChatGPT/OpenAI

Export from chat.openai.com/settings → Data Controls → Export

# Auto-detect format
ctk import conversations.json --db chats.db

# Explicit format with tags
ctk import chatgpt_export.json --db chats.db --format openai --tags "work,2024"

Claude/Anthropic

ctk import claude_export.json --db chats.db --format anthropic

GitHub Copilot (from VS Code)

# Import from VS Code workspace storage
ctk import ~/.vscode/workspaceStorage --db chats.db --format copilot

# Auto-find Copilot data
python -c "from ctk.integrations.importers.copilot import CopilotImporter; \
          paths = CopilotImporter.find_copilot_data(); \
          print('\n'.join(map(str, paths)))"

Local LLM Formats (JSONL)

# Import JSONL for fine-tuning datasets
ctk import training_data.jsonl --db chats.db --format jsonl

# Batch import
for file in *.jsonl; do
    ctk import "$file" --db chats.db --format jsonl
done

Organization Features

Star Conversations

# Star for quick access
ctk star abc123 --db chats.db

# Star multiple
ctk star abc123 def456 ghi789 --db chats.db

# Unstar
ctk star --unstar abc123 --db chats.db

# List starred
ctk list --db chats.db --starred

Pin Conversations

# Pin important conversations to the top
ctk pin abc123 --db chats.db

# Unpin
ctk pin --unpin abc123 --db chats.db

# List pinned
ctk list --db chats.db --pinned

Archive Conversations

# Archive old conversations
ctk archive abc123 --db chats.db

# Unarchive
ctk archive --unarchive abc123 --db chats.db

# List archived (excluded from default views)
ctk list --db chats.db --archived

Database Operations

Merge Databases

# Combine multiple databases
ctk merge source1.db source2.db --output merged.db

# Automatically handles duplicates by conversation ID

Database Diff

# Compare two databases
ctk diff db1.db db2.db

# Shows:
# - Conversations only in db1
# - Conversations only in db2
# - Conversations with different content

Filter and Extract

# Create filtered database
ctk filter --db all_chats.db --output work_chats.db --tags "work"
ctk filter --db all_chats.db --output starred.db --starred
ctk filter --db all_chats.db --output recent.db --date-from 2024-01-01

Export for Fine-Tuning

JSONL Format

# JSONL format for local LLMs
ctk export training.jsonl --db chats.db --format jsonl

# Include only assistant responses
ctk export responses.jsonl --db chats.db --format jsonl --path-selection longest

# Export with metadata
ctk export full_export.jsonl --db chats.db --format jsonl --include-metadata

Export with Filtering

# Export specific conversations
ctk export selected.jsonl --db chats.db --ids conv1 conv2 conv3

# Filter by source
ctk export openai_only.json --db chats.db --filter-source "ChatGPT"

# Filter by model
ctk export gpt4_convs.json --db chats.db --filter-model "GPT-4"

# Filter by tags
ctk export work_chats.json --db chats.db --filter-tags "work,important"

Path Selection for Branching Conversations

When exporting branching conversations, choose which path to include:

# Export longest path (most comprehensive)
ctk export out.jsonl --db chats.db --path-selection longest

# Export first path (original)
ctk export out.jsonl --db chats.db --path-selection first

# Export most recent path (latest regeneration)
ctk export out.jsonl --db chats.db --path-selection last

Why this matters: ChatGPT often has multiple regenerated responses. Path selection lets you choose which variant to include in your training data or export.

Python API

from ctk import ConversationDB, registry

# Load conversations
with ConversationDB("chats.db") as db:
    # Search
    results = db.search_conversations("python async")

    # Load specific conversation
    conv = db.load_conversation("conv_id_123")

    # Get all paths in branching conversation
    paths = conv.get_all_paths()
    longest = conv.get_longest_path()

    # Add new message to existing conversation
    from ctk import Message, MessageContent, MessageRole

    msg = Message(
        role=MessageRole.USER,
        content=MessageContent(text="New question")
    )
    conv.add_message(msg, parent_id="previous_msg_id")
    db.save_conversation(conv)

Batch Operations

import glob
from ctk import ConversationDB, registry

# Import all exports from a directory
with ConversationDB("all_chats.db") as db:
    for file in glob.glob("exports/*.json"):
        format = "openai" if "chatgpt" in file.lower() else None
        convs = registry.import_file(file, format=format)

        for conv in convs:
            # Add file source as tag
            conv.metadata.tags.append(f"file:{file}")
            db.save_conversation(conv)

    # Get statistics
    stats = db.get_statistics()
    print(f"Imported {stats['total_conversations']} conversations")

Statistics

ctk stats --db chats.db

Output:

Database Statistics:
  Total conversations: 851
  Total messages: 25890
  Starred: 23
  Pinned: 5
  Archived: 142

Messages by role:
    assistant: 12388
    user: 9574
    system: 1632

Conversations by source:
    ChatGPT: 423
    Claude: 287
    Copilot: 141

The Long Echo Connection

CTK was built for the Long Echo project—preserving AI conversations for the long term. Key strategies:

1. Multiple Export Formats

Export to formats that will survive platform changes:

  • HTML5: Self-contained, works in any browser (even offline)
  • Markdown: Plain text with formatting, readable anywhere
  • JSON: Structured data, easy to parse decades later
  • Plain text: Ultimate fallback for maximum longevity
# Export to blog (Hugo static site)
ctk export blog/conversations.html --db life.db --format html5
ctk export blog/conversations/ --db life.db --format markdown
ctk export blog/conversations.txt --db life.db --format text

2. Physical Backups

The blog post “Long Echo” documents the full strategy:

  • USB drives given to loved ones
  • CDs/DVDs for optical redundancy
  • Multiple cloud providers (GitHub Pages, Netlify, Vercel)
  • Local NAS/backup systems

3. Format Resilience

HTML5 + JavaScript: CTK’s HTML export includes:

  • Interactive browsing interface
  • Search functionality (works offline)
  • Tree visualization for branching conversations
  • No server dependencies—pure static files

Markdown + YAML: Hugo-compatible format:

  • Browse naturally in any text editor
  • Git-friendly for version control
  • Easy to migrate to other static site generators

MCP (Model Context Protocol) Integration

CTK supports MCP for tool calling during live chat:

# Start TUI with MCP server
ctk chat --db chats.db --mcp-config mcp.json

MCP servers provide tools that the LLM can call:

  • File system operations
  • Web search
  • Database queries
  • Custom functions

Example MCP configuration:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed"],
      "env": {}
    }
  }
}

The LLM can now read files, search directories, and perform operations through MCP tools during the conversation.

Design Philosophy

🌳 Trees, Not Lists: All conversations are trees, enabling natural representation of branching

🔌 Pluggable by Default: Auto-discovery of importers/exporters makes adding providers trivial

🔒 Privacy First: 100% local, no telemetry, optional sanitization

📊 Rich by Default: Beautiful terminal output with color-coded tables

🤖 LLM-Powered: Natural language queries, auto-tagging, smart search

⚡ Fast: SQLite with proper indexing for instant search

Use Cases

Personal Knowledge Management

Import all your AI conversations and search them like a second brain:

# Find that Python trick you discussed 3 months ago
ctk search "python context manager" --db personal.db

# Or ask naturally
ctk ask "show me that conversation about decorators" --db personal.db

Fine-Tuning Datasets

Export curated conversations for training local models:

# Export starred conversations for fine-tuning
ctk filter --db all.db --output curated.db --starred
ctk export fine_tune.jsonl --db curated.db --format jsonl

Conversation Archaeology

Analyze your interaction patterns with AI over time:

with ConversationDB("life.db") as db:
    stats = db.get_statistics()

    # Most common topics
    from collections import Counter
    all_tags = []
    for conv in db.get_all_conversations():
        all_tags.extend(conv.metadata.tags)

    common = Counter(all_tags).most_common(10)
    print("Most discussed topics:", common)

Backup and Portability

Never lose your conversations when switching providers:

# Export everything to multiple formats
ctk export backup_$(date +%Y%m%d).json --db life.db --format json
ctk export archive_$(date +%Y%m%d).html --db life.db --format html5
ctk export portable_$(date +%Y%m%d).md --db life.db --format markdown

Development

# Run tests
make test

# Unit tests only
make test-unit

# Integration tests
make test-integration

# Coverage report
make coverage

# Format code (black + isort)
make format

# Lint (flake8 + mypy)
make lint

# Clean build artifacts
make clean

Roadmap

Completed ✅

  • Terminal UI with conversation management
  • Rich console output with tables
  • Natural language queries (ask command)
  • Star/pin/archive organization
  • Multiple export formats (JSONL, JSON, Markdown, HTML5)
  • MCP tool integration
  • Auto-tagging with LLM
  • Database merge/diff operations

In Progress 🔨

  • Embeddings and similarity search
  • Unit and integration test coverage
  • Performance optimization for large databases

Planned 📋

  • Web-based UI (complement to TUI)
  • Conversation deduplication utilities
  • LangChain/LlamaIndex integration
  • Advanced analytics dashboard

Resources

License

MIT


CTK: Because your conversations with AI are valuable knowledge that deserves to be preserved, searchable, and yours forever. Built for the Long Echo project—ensuring today’s conversations remain accessible decades from now.

Discussion