Skip to main content

Everything is a File: Virtual Filesystems for CLI Data Tools

I had a bookmark manager. Then an ebook library manager. Then a chat history manager. Each started with the standard CRUD CLI:

btk add https://example.com --tags python,tutorial
btk list --tag python
btk search "async"
btk delete 1234

ebk import book.pdf --author "Knuth"
ebk list --author Knuth
ebk search "algorithms"

This works fine until you have 10,000+ bookmarks organized with hierarchical tags like programming/python/async, research/ml/transformers, work/clients/acme. Your ebook library has similar structure. Your exported chat conversations from Claude, ChatGPT, and Copilot are piling up.

Traditional CRUD commands become unwieldy:

btk list --tag programming/python/async/io --format json | jq '.[].title'
ebk list --category "Computer Science/Algorithms/Graph Theory" --limit 50
ctk search "machine learning" --source ChatGPT --date-from 2024-01-01

Each command requires precise arguments. Each tool has different flag conventions. You can’t navigate your data. You can only query it. And queries require knowing exactly what you’re looking for.

The insight: everything is a file

When I have thousands of source files organized in directories, I don’t run:

list-files --path /src/components/auth --extension .tsx

I run:

cd src/components/auth
ls *.tsx

The difference matters. With a filesystem, I can navigate incrementally (cd from general to specific), explore (ls to see what’s there), compose (cat file | grep pattern | wc -l), and use familiar tools (find, grep, xargs, pipes, redirection).

What if my bookmarks, ebooks, and chat histories were filesystems?

The pattern

Over the past year, I built six Python tools that all follow the same architecture:

ToolDomainVFS Root Structure
btkBookmarks/bookmarks/, /tags/, /recent/, /domains/, /unread/, /popular/
ebkEbook library/books/, /authors/, /series/, /subjects/, /recent/, /unread/
ctkChat conversations/conversations/, /sources/, /topics/, /starred/, /recent/
ghopsGit repositories/repos/, /languages/, /topics/, /stars/, /recent/
infinigramN-gram models/datasets/, /models/, /corpora/
AlgoTreeTree structures/nodes/, /paths/, /subtrees/

Each tool provides:

  1. A stateless CLI for scripting: btk bookmark add URL, ebk import book.pdf
  2. An interactive shell with a virtual filesystem: btk shell, ebk shell, ctk chat
  3. POSIX-like commands: cd, ls, pwd, cat, mv, cp, rm, find, grep
  4. Unix pipeline support: most commands output JSONL by default for piping

The interesting part is the shell.

--:----:--

Live recording captured with asciinema. You can pause, copy text, and replay. The entire recording is 78KB of text.

The old way

$ btk search "python async" --tag programming --limit 10
# Returns JSON blob... now what?

$ btk list --tag "programming/python/async"
# Hope I remembered the exact tag path

$ btk bookmark get 4095 --format json | jq '.tags'
# One bookmark at a time, verbose

The VFS way

$ btk shell

      __    __  __
     / /_  / /_/ /__
    / __ \/ __/ //_/
   / /_/ / /_/ ,<
  /_.___/\__/_/|_|  v0.7.1

  Bookmark Toolkit - Virtual Filesystem Shell

btk:/$ ls
bookmarks/    (10,247)    All bookmarks
tags/                    Tag hierarchy
recent/                  Time-based navigation
domains/                 Browse by domain
unread/       (2,431)    Never visited
popular/      (100)      Most visited
broken/       (14)       Dead links
starred/      (156)      Starred bookmarks

btk:/$ cd tags/programming/python

btk:/tags/programming/python$ ls
async/        (87)
web/          (124)
data/         (156)
ml/           (89)
testing/      (45)

btk:/tags/programming/python/async$ ls | head -5
4095  5234  6012  6891  7234

btk:/tags/programming/python/async$ cat 4095/title
Real Python - Async IO in Python: A Complete Walkthrough

btk:/tags/programming/python/async$ cat 4095/url
https://realpython.com/async-io-python/

btk:/tags/programming/python/async$ star 4095
★ Starred bookmark #4095

btk:/tags/programming/python/async$ cd /recent/today/added

btk:/recent/today/added$ ls
8901  8902  8903  8904

btk:/recent/today/added$ tag 8901 8902 8903 todo
✓ Tagged 3 bookmarks

No flag memorization. Incremental exploration. Context-aware commands. It’s just directories and files.

Same pattern, different data

Ebooks (ebk)

ebk:/$ cd subjects/Computer\ Science/Algorithms

ebk:/subjects/Computer Science/Algorithms$ ls
Introduction to Algorithms.pdf
The Algorithm Design Manual.pdf
Algorithms (Sedgewick).pdf

ebk:/subjects/Computer Science/Algorithms$ cat "Introduction to Algorithms.pdf"/metadata
Title: Introduction to Algorithms
Authors: Cormen, Leiserson, Rivest, Stein
ISBN: 978-0262033848
Pages: 1312
Rating: 5/5

ebk:/subjects/Computer Science/Algorithms$ rate * 5
✓ Rated 3 books

Chat history (ctk)

ctk:/$ cd sources/ChatGPT

ctk:/sources/ChatGPT$ ls | wc -l
423

ctk:/sources/ChatGPT$ cd /topics/machine-learning

ctk:/topics/machine-learning$ ls
conv_a1b2c3  conv_d4e5f6  conv_g7h8i9

ctk:/topics/machine-learning$ show conv_a1b2c3
[Shows conversation tree with messages]

ctk:/topics/machine-learning$ star conv_a1b2c3
★ Starred conversation

ctk:/topics/machine-learning$ cd /starred

ctk:/starred$ export --format markdown > starred_ml_convos.md
✓ Exported 5 starred conversations to starred_ml_convos.md

Git repositories (ghops)

ghops:/$ cd languages/Python

ghops:/languages/Python$ ls
btk/  ebk/  ctk/  ghops/  infinigram/  AlgoTree/

ghops:/languages/Python$ cd btk

ghops:/languages/Python/btk$ status
Branch: master
Commits ahead: 0
Uncommitted changes: 0
Last commit: Release v0.7.1 (2 days ago)

What makes it work

After building six of these, the patterns are clear.

Stateless CLI + stateful shell

Every tool has both interfaces. The CLI is for automation and scripting. The shell is for humans.

# CLI (stateless, scriptable)
btk bookmark add https://example.com --tags python,tutorial
ebk import book.pdf --author "Knuth"
ctk export --format jsonl > training.jsonl

# Shell (stateful, exploratory)
btk shell
cd tags/python
star *

Dynamic virtual directories

Traditional filesystems are static. The VFS exposes computed views:

btk:/$ ls
unread/       (2,431)    # SELECT * WHERE visit_count = 0
popular/      (100)      # SELECT * ORDER BY visit_count DESC LIMIT 100
broken/       (14)       # SELECT * WHERE reachable = false
recent/today/added/      # SELECT * WHERE added >= TODAY

These “directories” don’t exist on disk. They’re queries. But they feel like directories, which is the point.

Context-aware commands

Commands understand where you are:

btk:/bookmarks/4095$ cat title
# Shows title of bookmark 4095

btk:/tags/python$ star *
# Stars all Python-tagged bookmarks

btk:/recent/today/added$ tag * review
# Tags today's additions

btk:/broken$ rm *
# Removes all broken bookmarks

The current path becomes implicit context. No need to repeat IDs or filters.

Hierarchical tags as directories

This is the killer feature. Hierarchical tags map directly to filesystem paths.

# Tag with hierarchy
btk tag 4095 programming/python/async/io

# Navigate the hierarchy
btk:/$ cd tags/programming
btk:/tags/programming$ ls
python/  javascript/  rust/  go/

btk:/tags/programming$ cd python/async
btk:/tags/programming/python/async$ ls
io/  frameworks/  patterns/

# Bulk operations on a hierarchy
btk:/tags/programming/python$ star */advanced/*

Compare with flat tags: python, python-async, python-async-io, python-web, python-web-django. Hierarchy gives you free navigation and organization.

JSONL by default

All commands output newline-delimited JSON by default:

# Pipe to jq, grep, awk, anything
btk list | jq 'select(.stars == true)'
ebk status | grep "rating: 5"
ctk search "python" | jq '.id' | xargs ctk export --ids

# Pretty-print for humans
btk list --pretty

This makes the tools composable with the Unix ecosystem. JSONL is streamable, appendable, grepable, and robust (one malformed record doesn’t break the file).

Implementation

The architecture is straightforward. Each tool shares the same stack:

  1. Database layer (SQLAlchemy + SQLite): normalized schema, FTS5 search, efficient indexing
  2. VFS layer (Python cmd.Cmd): path parsing, context detection, command routing
  3. Command layer: context-aware implementations of do_ls(), do_cd(), do_cat(), etc.
  4. CLI layer (Typer/argparse): stateless commands for scripting, JSONL output

The core of it is context detection:

def _get_context(self):
    """Determine what 'directory' we're in."""
    if self.current_path == "/":
        return {'type': 'root'}

    parts = self.current_path.strip('/').split('/')

    if parts[0] == 'tags':
        # /tags/programming/python
        tag_path = '/'.join(parts[1:])
        bookmarks = self.db.filter_by_tag_prefix(tag_path)
        return {'type': 'tag', 'tag': tag_path, 'bookmarks': bookmarks}

    elif parts[0] == 'recent':
        # /recent/today/added
        period = parts[1]  # 'today'
        activity = parts[2] if len(parts) > 2 else 'visited'
        bookmarks = filter_by_time_and_activity(period, activity)
        return {'type': 'recent_activity', 'period': period,
                'activity': activity, 'bookmarks': bookmarks}

    elif parts[0] == 'unread':
        # /unread - smart collection
        bookmarks = self.db.filter(visit_count=0)
        return {'type': 'smart_collection', 'name': 'unread',
                'bookmarks': bookmarks}

    elif parts[0] == 'bookmarks' and len(parts) == 2:
        # /bookmarks/4095
        bookmark_id = int(parts[1])
        bookmark = self.db.get(bookmark_id)
        return {'type': 'bookmark', 'bookmark_id': bookmark_id,
                'bookmark': bookmark}

Once you know the context, commands adapt:

def do_ls(self, args):
    """List items in current directory."""
    context = self._get_context()

    if context['type'] == 'root':
        self._ls_root()
    elif context['type'] == 'tag':
        self._ls_tag(context['tag'], context['bookmarks'])
    elif context['type'] == 'recent_activity':
        self._ls_bookmarks(context['bookmarks'])
    elif context['type'] == 'smart_collection':
        self._ls_collection(context['name'], context['bookmarks'])
    elif context['type'] == 'bookmark':
        self._ls_bookmark(context['bookmark'])

Context detection + polymorphic commands. That’s the whole trick.

Why it matters

Traditional CLIs force you to remember exact syntax, construct precise queries, and process JSON blobs. VFS interfaces let you explore incrementally, discover what exists, and operate on context.

It matches how humans think: spatial navigation over query construction. We already know filesystems. We already know cd, ls, grep, find. The VFS pattern lets you reuse that knowledge for any hierarchical data.

The tools

All six are open source and on PyPI:

If you build CLI tools for hierarchical data, consider the VFS pattern. Your users already know cd and ls. Why make them learn 47 flags?


Technical notes

Recording shell sessions

The interactive demo uses asciinema:

asciinema rec btk-demo.cast
btk shell
# ... do your demo ...
exit

The .cast file is pure text (JSON), usually a few KB. You get interactive playback via asciinema-player, copy-pasteable text, and 78KB for several minutes of demo instead of megabytes for a GIF or video. This is better than screen recordings for terminal demos.

Test coverage

All six tools have comprehensive test suites:

  • btk: 515 tests (53% shell coverage, 23% CLI coverage)
  • ghops: 138 tests, 86% coverage
  • infinigram: 36 tests with benchmarks
  • AlgoTree: 197 tests, 86% coverage

The VFS pattern is highly testable. Each component (path parsing, context detection, command handlers) is isolated and pure. Mock the database, test the rest.

Discussion