active
library
Featured
arkiv
Universal personal data format. JSONL in, SQL out, SQL back to JSONL. One format, one database, one query interface.
Resources & Distribution
Source Code
Package Registries
1
Stars
arkiv
Universal personal data format. JSONL in, SQL out, MCP to LLMs.
The Format
Every record is a JSON object. All fields optional.
{"mimetype": "text/plain", "content": "I think the key insight is...", "uri": "https://chatgpt.com/c/abc", "timestamp": "2023-05-14T10:30:00Z", "metadata": {"role": "user", "conversation_id": "abc"}}
{"mimetype": "audio/wav", "uri": "file://media/podcast.wav", "timestamp": "2024-01-15", "metadata": {"transcript": "Welcome to...", "duration": 45.2}}
{"mimetype": "image/jpeg", "uri": "file://media/photo.jpg", "metadata": {"caption": "My talk at MIT"}}
The Stack
JSONL files (canonical, portable, human-readable)
↓ arkiv import
SQLite database (queryable, efficient, standard SQL)
↓ arkiv mcp
MCP server (3 tools → any LLM)
Quick Start
pip install arkiv
# Import JSONL to SQLite
arkiv import conversations.jsonl --db archive.db
# Query
arkiv query archive.db "SELECT content FROM records WHERE metadata->>'role' = 'user' LIMIT 5"
# Serve to LLMs via MCP
arkiv mcp archive.db
MCP Tools
| Tool | Description |
|---|---|
get_manifest() | What collections exist, their descriptions and schemas |
get_schema(collection?) | What metadata keys can be queried |
sql_query(query) | Run read-only SQL |
Why
- Your data lives in silos (ChatGPT, email, bookmarks, photos, voice memos)
- Source toolkits (memex, mtk, btk, ptk, ebk) export it as JSONL
- arkiv gives you one format, one database, one query interface
- Any LLM can query it via MCP
- JSONL is human-readable and durable. SQLite is the most deployed database in history.
Spec
See SPEC.md for the full technical specification.