Dotsuite

Below you will find pages that utilize the taxonomy term “Dotsuite”

Building Languages to Solve Problems

January 19, 2026

Chapter 4 of Structure and Interpretation of Computer Programs opens with one of the most important insights in programming: the most powerful technique for controlling complexity is metalinguistic abstraction, the establishment of new languages.

Not libraries. Not frameworks. Languages.

When you’ve abstracted enough of a problem domain into primitives, combination rules, and naming mechanisms, you haven’t just written code. You’ve created a new way of thinking about the problem. The domain becomes expressible. And once something is expressible, it becomes manipulable, debuggable, and shareable.

What Is Metalinguistic Abstraction?

The key distinction is between using a language and creating one. A library gives you functions to call. A language gives you a grammar for expressing ideas.

Consider the difference:

Library approach: Call db.execute("SELECT * FROM users WHERE age > 21")

Language approach: Write SELECT * FROM users WHERE age > 21

SQL isn’t a library. It’s a language, with primitives (tables, columns), means of combination (joins, unions, subqueries), and means of abstraction (views, CTEs). These three elements (primitives, combination, abstraction) are SICP’s fundamental criteria for any language, and they’re what separates a DSL from a mere API.

Other examples:

Regular expressions: primitives (characters, character classes), combination (concatenation, alternation), abstraction (groups, backreferences)
Make: primitives (targets, prerequisites), combination (dependency chains), abstraction (pattern rules, variables)
CSS selectors: primitives (elements, classes, IDs), combination (descendant, child, sibling), abstraction (custom properties, mixins in preprocessors)

In each case, the language captures the essential structure of the problem domain in a way that raw code cannot.

The Three Requirements

SICP identifies three necessary components for any language:

Primitives: What are the basic elements that cannot be broken down further?
Means of combination: How do you build compound elements from simpler ones?
Means of abstraction: How do you name and reuse patterns?

When designing a DSL, these questions guide everything. Get them wrong and you have a clunky API. Get them right and the domain becomes thinkable in your language.

Consider an expression language for symbolic math:

Primitives: numbers, symbols, operators
Combination: function application (+ x 1), nested expressions (* (+ x 1) 2)
Abstraction: named rules, rulesets, engines

Or a query language for JSON documents:

JAF: Streaming Boolean Algebra Over Nested JSON

December 20, 2024

JAF (Just Another Flow) is a streaming data processing system for JSON/JSONL data. It implements boolean algebra over nested JSON structures with lazy evaluation, composable operations, and a fluent API. JAF is the production version of the concepts I explored in dotsuite.

The Relationship to Dotsuite

The short version:

dotsuite: “This is how it works.” Pedagogical, simple, learn-by-building.
JAF: “This is what you use.” Feature-complete, lazy, handles real data.

JAF implements the highest level of dotsuite’s architecture: boolean algebra over collections of nested documents. Where dotsuite teaches the concepts through isolated simple tools, JAF combines them into a unified streaming framework.

The Boolean Algebra Branch

In dotsuite’s three-pillar architecture (Depth, Truth, Shape), JAF focuses on the collections layer, specifically the boolean wing that provides filtering operations with full boolean algebra:

\[ \text{filter}: (\mathcal{D} \to \mathbb{B}) \to (C \to C) \]

Where \(\mathcal{D}\) is the document space, \(\mathbb{B}\) is boolean values, and \(C\) is a collection of documents.

JAF lifts boolean operations to streams: AND is intersection of filtered streams, OR is union, NOT is complement, and composition gives you chainable predicates with guaranteed homomorphism.

Core Innovation: Lazy Streaming

The Problem

Traditional data processing loads entire datasets into memory:

# Eager evaluation - loads everything
all_data = load_json("huge_file.jsonl")
filtered = [d for d in all_data if d['age'] > 25]
mapped = [transform(d) for d in filtered]

This fails on large datasets and wastes resources when you only need the first 10 results.

JAF’s Solution

from jaf import stream

# Lazy evaluation - nothing executes yet
pipeline = stream("huge_file.jsonl") \
    .filter(["gt?", "@age", 25]) \
    .map(transform) \
    .take(10)

# Only processes 10 matching items
for item in pipeline.evaluate():
    process(item)

Constant memory (processes one item at a time), early termination (stops after take(10)), composable (build complex pipelines declaratively), and works with infinite streams.

Three Query Syntaxes

JAF supports multiple query syntaxes that all compile to the same internal representation.

S-Expression Syntax (Lisp-like)

# Simple comparisons
(eq? @status "active")
(gt? @age 25)
(contains? @tags "python")

# Boolean logic
(and
    (gte? @age 18)
    (eq? @verified true))

# Nested expressions
(or (eq? @role "admin")
    (and (eq? @role "user")
         (gt? @score 100)))

S-expressions because: unambiguous parsing (no precedence rules), easy to serialize, homoiconic (code is data), composable ASTs.

JSON Array Syntax

# Same queries in JSON
["eq?", "@status", "active"]
["gt?", "@age", 25]

["and",
    ["gte?", "@age", 18],
    ["eq?", "@verified", true]
]

Easy to generate programmatically, standard JSON format, network-transmissible.

Infix DSL Syntax

# Natural infix notation
@status == "active"
@age > 25 and @verified == true
@role == "admin" or (@role == "user" and @score > 100)

Human-readable, familiar, good for CLI usage. All three compile to the same AST.

jsonl-algebra: Relational Algebra for Nested JSON

December 18, 2024

jsonl-algebra (command: ja) is a command-line implementation of relational algebra for JSONL data. It’s the production version of dotsuite’s dotrelate component: SQL-like operations on the command line with first-class support for nested JSON structures.

The Relationship to Dotsuite

In dotsuite’s architecture, dotrelate provides relational operations on document collections: join, union, project, difference. jsonl-algebra (ja) is the production implementation of those concepts, published on PyPI, with all relational operations plus aggregations, streaming support, schema tools, and an interactive REPL.

The Core Insight

Traditional relational algebra assumes flat tables:

SELECT name, age FROM users WHERE age > 30

But real-world JSON is deeply nested:

{
  "user": {
    "id": 1,
    "name": "Alice",
    "address": {
      "city": "NYC",
      "zip": "10001"
    }
  },
  "orders": [
    {"id": 101, "amount": 50}
  ]
}

jsonl-algebra bridges this gap by extending relational algebra with dot notation for nested access:

ja select 'user.age > 30' users.jsonl
ja project user.name,user.address.city users.jsonl
ja join users.jsonl orders.jsonl --on user.id=customer_id

The Five Core Operations

Relational algebra has five fundamental operations that form a complete algebra. Everything else is derived.

1. Selection (filter rows)

Mathematical notation: \(\sigma_{\text{predicate}}(R)\)

# Filter where status is "active"
ja select 'status == `"active"`' data.jsonl

# Filter on nested fields
ja select 'user.age > 30' users.jsonl

# Complex boolean logic
ja select 'price < 100 and category == `"electronics"`' products.jsonl

Selection is commutative (\(\sigma_{p_1}(\sigma_{p_2}(R)) = \sigma_{p_2}(\sigma_{p_1}(R))\)) and combinable (\(\sigma_{p_1}(\sigma_{p_2}(R)) = \sigma_{p_1 \land p_2}(R)\)).

2. Projection (select/compute columns)

Mathematical notation: \(\pi_{\text{columns}}(R)\)

# Pick specific fields
ja project id,name data.jsonl

# Access nested fields
ja project user.name,user.address.city users.jsonl

# Computed columns (coming soon)
ja project name,annual_income=salary*12 employees.jsonl

Idempotent for simple projections: \(\pi_a(\pi_{a,b}(R)) = \pi_a(R)\).

3. Join (combine relations)

Mathematical notation: \(R \bowtie_{\text{condition}} S\)

# Inner join on user ID
ja join users.jsonl orders.jsonl --on user.id=customer_id

# Join on nested fields
ja join posts.jsonl comments.jsonl --on post.id=comment.post_id

# Multiple join keys
ja join users.jsonl accounts.jsonl --on id=user_id,email=account_email

Commutative and associative, so you can join multiple files in any order:

ja join users.jsonl orders.jsonl --on user.id=customer_id \
  | ja join - products.jsonl --on product_id=id

4. Union (combine all rows)

Mathematical notation: \(R \cup S\)

# Combine employees and contractors
ja union employees.jsonl contractors.jsonl

# Union multiple sources
ja union jan.jsonl feb.jsonl mar.jsonl

5. Difference (set subtraction)

Mathematical notation: \(R - S\)

The Dot Ecosystem: From Simple Paths to Data Algebras

December 15, 2024

dotsuite is a suite of composable tools for working with nested data structures like JSON, YAML, and Python dictionaries. It started as a single helper function and grew into something with actual mathematical structure. That growth is the interesting part.

The Origin

It always starts with a simple problem. You have a nested dictionary and you need a value buried deep inside:

# Brittle code that crashes on missing keys
email = data['user']['contacts'][0]['email']  # KeyError? IndexError?

The first solution is a helper function:

# The essence of dotget - simple enough to copy
def get(data, path, default=None):
    try:
        for segment in path.split('.'):
            data = data[int(segment)] if segment.isdigit() else data[segment]
        return data
    except (KeyError, IndexError, TypeError):
        return default

This is where the story begins. That single function, once you start asking questions about what else you need, leads to a complete ecosystem for data manipulation. The trick is that the questions have a natural structure to them.

The Three Pillars

The ecosystem organizes around three fundamental questions about data:

Depth Pillar: “Where is the data?”

Tools for finding and extracting values from within documents.

Tool	Purpose	Complexity
dotget	Simple exact paths	`get(data, "user.name")`
dotstar	Wildcard patterns	`search(data, "users.*.name")`
dotselect	Advanced selection with predicates	`find_first(data, "users[role=admin].name")`
dotpath	Extensible path engine	Powers all other tools, Turing-complete

The addressing layer forms a free algebra on selectors, with operators being morphisms in the Kleisli category of the powerset monad. In practice this means dotstar composed with dotselect still yields a well-defined set of values. You can compose these things without worrying about edge cases blowing up.

Truth Pillar: “Is this assertion true?”

Tools for asking boolean questions and validating data.

Tool	Purpose	Logic
dotexists	Path existence	`check(data, "user.email")`
dotany	Existential quantifier	`any_match(data, "users.*.role", "admin")`
dotall	Universal quantifier	`all_match(data, "users.*.status", "active")`
dotquery	Compositional logic engine	`Query("any equals role admin").check(data)`

Predicates form a Boolean algebra under conjunction, disjunction, and negation that is homomorphic to set algebra on result subsets. This enables short-circuit evaluation and distributive laws. The math isn’t decoration; it’s what makes the composition reliable.

Shape Pillar: “How should the data be transformed?”

Tools for reshaping and modifying data structures.

Tool	Purpose	Type
dotmod	Surgical modifications	`set_(data, "user.status", "inactive")`
dotbatch	Atomic transactions	Apply multiple changes safely
dotpipe	Data transformation pipelines	Reshape documents into new forms
dotpluck	Value extraction	Create new structures from selections

Transformations are endofunctors on document spaces with monoid composition. dotmod implements lenses with put-get laws, while dotpipe provides Kleisli composition of pure functions.