Building Languages to Solve Problems

January 19, 2026

Chapter 4 of Structure and Interpretation of Computer Programs opens with one of the most important insights in programming: the most powerful technique for controlling complexity is metalinguistic abstraction, the establishment of new languages.

Not libraries. Not frameworks. Languages.

When you’ve abstracted enough of a problem domain into primitives, combination rules, and naming mechanisms, you haven’t just written code. You’ve created a new way of thinking about the problem. The domain becomes expressible. And once something is expressible, it becomes manipulable, debuggable, and shareable.

What Is Metalinguistic Abstraction?

The key distinction is between using a language and creating one. A library gives you functions to call. A language gives you a grammar for expressing ideas.

Consider the difference:

Library approach: Call db.execute("SELECT * FROM users WHERE age > 21")

Language approach: Write SELECT * FROM users WHERE age > 21

SQL isn’t a library. It’s a language, with primitives (tables, columns), means of combination (joins, unions, subqueries), and means of abstraction (views, CTEs). These three elements (primitives, combination, abstraction) are SICP’s fundamental criteria for any language, and they’re what separates a DSL from a mere API.

Other examples:

  • Regular expressions: primitives (characters, character classes), combination (concatenation, alternation), abstraction (groups, backreferences)
  • Make: primitives (targets, prerequisites), combination (dependency chains), abstraction (pattern rules, variables)
  • CSS selectors: primitives (elements, classes, IDs), combination (descendant, child, sibling), abstraction (custom properties, mixins in preprocessors)

In each case, the language captures the essential structure of the problem domain in a way that raw code cannot.

The Three Requirements

SICP identifies three necessary components for any language:

  1. Primitives: What are the basic elements that cannot be broken down further?
  2. Means of combination: How do you build compound elements from simpler ones?
  3. Means of abstraction: How do you name and reuse patterns?

When designing a DSL, these questions guide everything. Get them wrong and you have a clunky API. Get them right and the domain becomes thinkable in your language.

Consider an expression language for symbolic math:

  • Primitives: numbers, symbols, operators
  • Combination: function application (+ x 1), nested expressions (* (+ x 1) 2)
  • Abstraction: named rules, rulesets, engines

Or a query language for JSON documents:

Read More

jsonl-algebra: Relational Algebra for Nested JSON

December 18, 2024

jsonl-algebra (command: ja) is a command-line implementation of relational algebra for JSONL data. It’s the production version of dotsuite’s dotrelate component: SQL-like operations on the command line with first-class support for nested JSON structures.

The Relationship to Dotsuite

In dotsuite’s architecture, dotrelate provides relational operations on document collections: join, union, project, difference. jsonl-algebra (ja) is the production implementation of those concepts, published on PyPI, with all relational operations plus aggregations, streaming support, schema tools, and an interactive REPL.

The Core Insight

Traditional relational algebra assumes flat tables:

SELECT name, age FROM users WHERE age > 30

But real-world JSON is deeply nested:

{
  "user": {
    "id": 1,
    "name": "Alice",
    "address": {
      "city": "NYC",
      "zip": "10001"
    }
  },
  "orders": [
    {"id": 101, "amount": 50}
  ]
}

jsonl-algebra bridges this gap by extending relational algebra with dot notation for nested access:

ja select 'user.age > 30' users.jsonl
ja project user.name,user.address.city users.jsonl
ja join users.jsonl orders.jsonl --on user.id=customer_id

The Five Core Operations

Relational algebra has five fundamental operations that form a complete algebra. Everything else is derived.

1. Selection (filter rows)

Mathematical notation: \(\sigma_{\text{predicate}}(R)\)

# Filter where status is "active"
ja select 'status == `"active"`' data.jsonl

# Filter on nested fields
ja select 'user.age > 30' users.jsonl

# Complex boolean logic
ja select 'price < 100 and category == `"electronics"`' products.jsonl

Selection is commutative (\(\sigma_{p_1}(\sigma_{p_2}(R)) = \sigma_{p_2}(\sigma_{p_1}(R))\)) and combinable (\(\sigma_{p_1}(\sigma_{p_2}(R)) = \sigma_{p_1 \land p_2}(R)\)).

2. Projection (select/compute columns)

Mathematical notation: \(\pi_{\text{columns}}(R)\)

# Pick specific fields
ja project id,name data.jsonl

# Access nested fields
ja project user.name,user.address.city users.jsonl

# Computed columns (coming soon)
ja project name,annual_income=salary*12 employees.jsonl

Idempotent for simple projections: \(\pi_a(\pi_{a,b}(R)) = \pi_a(R)\).

3. Join (combine relations)

Mathematical notation: \(R \bowtie_{\text{condition}} S\)

# Inner join on user ID
ja join users.jsonl orders.jsonl --on user.id=customer_id

# Join on nested fields
ja join posts.jsonl comments.jsonl --on post.id=comment.post_id

# Multiple join keys
ja join users.jsonl accounts.jsonl --on id=user_id,email=account_email

Commutative and associative, so you can join multiple files in any order:

ja join users.jsonl orders.jsonl --on user.id=customer_id \
  | ja join - products.jsonl --on product_id=id

4. Union (combine all rows)

Mathematical notation: \(R \cup S\)

# Combine employees and contractors
ja union employees.jsonl contractors.jsonl

# Union multiple sources
ja union jan.jsonl feb.jsonl mar.jsonl

5. Difference (set subtraction)

Mathematical notation: \(R - S\)

Read More