DagShell is a virtual filesystem that organizes data by content instead of location. Identical files automatically share storage through SHA256 hashing. The structure is a directed acyclic graph rather than a tree, so the same content block can be referenced from multiple paths without duplication.
I built it because sometimes you need filesystem semantics without touching actual disk. Testing, sandboxing, versioning, portability. The implementation has 583 tests with 77% coverage.
The DAG structure
Traditional filesystems are trees: each file has exactly one parent. DagShell uses a DAG where content is stored once and referenced by hash:
/project/
├── src/
│ └── main.py ──────┐
├── backup/ │
│ └── main.py ──────┼──> [SHA256: abc123...] -> "print('hello')"
└── archive/ │
└── main.py ──────┘
Three paths, one storage block.
Fluent Python API
DagShell provides a chainable API that mirrors shell commands:
from dagshell.dagshell_fluent import DagShell
shell = DagShell()
# Create project structure
(shell
.mkdir("/project/src")
.mkdir("/project/docs")
.cd("/project/src")
.echo("def main(): pass").out("main.py")
.echo("# My Project").out("../docs/README.md"))
# Navigate with directory stack
shell.pushd("/tmp")
shell.touch("scratch.txt")
shell.popd() # Back to /project/src
# Save entire filesystem to JSON
shell.save("project_snapshot.json")
Terminal emulator
For interactive exploration:
python -m dagshell.terminal
dagshell:/$ mkdir /home/user
dagshell:/$ cd /home/user
dagshell:/home/user$ echo "Hello" > greeting.txt
dagshell:/home/user$ cat greeting.txt
Hello
dagshell:/home/user$ ls -la
total 1
drwxr-xr-x 2 user user 4096 Aug 15 10:00 .
drwxr-xr-x 3 user user 4096 Aug 15 10:00 ..
-rw-r--r-- 1 user user 6 Aug 15 10:00 greeting.txt
Virtual devices
Standard Unix special files work:
shell.echo("garbage").out("/dev/null") # Discarded
random_bytes = shell.cat("/dev/random") # Random data
zeros = shell.head("/dev/zero", 100) # 100 zero bytes
Import/export
Move files between real and virtual filesystems:
# Import from real filesystem
shell.import_file("/real/path/data.csv", "/virtual/data.csv")
# Export to real filesystem
shell.export_file("/virtual/results.json", "/real/path/results.json")
# Import entire directory
shell.import_dir("/real/project", "/virtual/project")
Persistence
The entire filesystem state serializes to JSON:
shell.save("filesystem.json")
restored = DagShell.load("filesystem.json")
# Or get JSON directly
state = shell.to_json()
The JSON format is human-readable:
{
"root": {
"type": "directory",
"children": {
"project": {
"type": "directory",
"children": {
"README.md": {
"type": "file",
"content_hash": "abc123..."
}
}
}
}
},
"content_store": {
"abc123...": "# My Project\n..."
}
}
Content hashes in the directory tree, actual content in a flat store. Deduplication falls out naturally.
Scheme DSL
For Lisp people, there’s a Scheme interface:
(mkdir "/project")
(cd "/project")
(echo "Hello" "greeting.txt")
(define files (ls))
I included this partly because I like Scheme and partly because a filesystem is a natural fit for s-expressions.
Use cases
- Build systems: track input/output files without disk I/O
- Testing frameworks: create fixture filesystems programmatically
- Backup tools: represent filesystem snapshots efficiently
- Educational: teach filesystem concepts without system access
Installation
pip install dagshell
# Or from source
pip install -e .
Discussion