Building Languages to Solve Problems

January 19, 2026

Chapter 4 of Structure and Interpretation of Computer Programs opens with one of the most important insights in programming: the most powerful technique for controlling complexity is metalinguistic abstraction, the establishment of new languages.

Not libraries. Not frameworks. Languages.

When you’ve abstracted enough of a problem domain into primitives, combination rules, and naming mechanisms, you haven’t just written code. You’ve created a new way of thinking about the problem. The domain becomes expressible. And once something is expressible, it becomes manipulable, debuggable, and shareable.

What Is Metalinguistic Abstraction?

The key distinction is between using a language and creating one. A library gives you functions to call. A language gives you a grammar for expressing ideas.

Consider the difference:

Library approach: Call db.execute("SELECT * FROM users WHERE age > 21")

Language approach: Write SELECT * FROM users WHERE age > 21

SQL isn’t a library. It’s a language, with primitives (tables, columns), means of combination (joins, unions, subqueries), and means of abstraction (views, CTEs). These three elements (primitives, combination, abstraction) are SICP’s fundamental criteria for any language, and they’re what separates a DSL from a mere API.

Other examples:

  • Regular expressions: primitives (characters, character classes), combination (concatenation, alternation), abstraction (groups, backreferences)
  • Make: primitives (targets, prerequisites), combination (dependency chains), abstraction (pattern rules, variables)
  • CSS selectors: primitives (elements, classes, IDs), combination (descendant, child, sibling), abstraction (custom properties, mixins in preprocessors)

In each case, the language captures the essential structure of the problem domain in a way that raw code cannot.

The Three Requirements

SICP identifies three necessary components for any language:

  1. Primitives: What are the basic elements that cannot be broken down further?
  2. Means of combination: How do you build compound elements from simpler ones?
  3. Means of abstraction: How do you name and reuse patterns?

When designing a DSL, these questions guide everything. Get them wrong and you have a clunky API. Get them right and the domain becomes thinkable in your language.

Consider an expression language for symbolic math:

  • Primitives: numbers, symbols, operators
  • Combination: function application (+ x 1), nested expressions (* (+ x 1) 2)
  • Abstraction: named rules, rulesets, engines

Or a query language for JSON documents:

Read More

JAF: Streaming Boolean Algebra Over Nested JSON

December 20, 2024

JAF (Just Another Flow) is a streaming data processing system for JSON/JSONL data. It implements boolean algebra over nested JSON structures with lazy evaluation, composable operations, and a fluent API. JAF is the production version of the concepts I explored in dotsuite.

The Relationship to Dotsuite

The short version:

  • dotsuite: “This is how it works.” Pedagogical, simple, learn-by-building.
  • JAF: “This is what you use.” Feature-complete, lazy, handles real data.

JAF implements the highest level of dotsuite’s architecture: boolean algebra over collections of nested documents. Where dotsuite teaches the concepts through isolated simple tools, JAF combines them into a unified streaming framework.

The Boolean Algebra Branch

In dotsuite’s three-pillar architecture (Depth, Truth, Shape), JAF focuses on the collections layer, specifically the boolean wing that provides filtering operations with full boolean algebra:

\[ \text{filter}: (\mathcal{D} \to \mathbb{B}) \to (C \to C) \]

Where \(\mathcal{D}\) is the document space, \(\mathbb{B}\) is boolean values, and \(C\) is a collection of documents.

JAF lifts boolean operations to streams: AND is intersection of filtered streams, OR is union, NOT is complement, and composition gives you chainable predicates with guaranteed homomorphism.

Core Innovation: Lazy Streaming

The Problem

Traditional data processing loads entire datasets into memory:

# Eager evaluation - loads everything
all_data = load_json("huge_file.jsonl")
filtered = [d for d in all_data if d['age'] > 25]
mapped = [transform(d) for d in filtered]

This fails on large datasets and wastes resources when you only need the first 10 results.

JAF’s Solution

from jaf import stream

# Lazy evaluation - nothing executes yet
pipeline = stream("huge_file.jsonl") \
    .filter(["gt?", "@age", 25]) \
    .map(transform) \
    .take(10)

# Only processes 10 matching items
for item in pipeline.evaluate():
    process(item)

Constant memory (processes one item at a time), early termination (stops after take(10)), composable (build complex pipelines declaratively), and works with infinite streams.

Three Query Syntaxes

JAF supports multiple query syntaxes that all compile to the same internal representation.

S-Expression Syntax (Lisp-like)

# Simple comparisons
(eq? @status "active")
(gt? @age 25)
(contains? @tags "python")

# Boolean logic
(and
    (gte? @age 18)
    (eq? @verified true))

# Nested expressions
(or (eq? @role "admin")
    (and (eq? @role "user")
         (gt? @score 100)))

S-expressions because: unambiguous parsing (no precedence rules), easy to serialize, homoiconic (code is data), composable ASTs.

JSON Array Syntax

# Same queries in JSON
["eq?", "@status", "active"]
["gt?", "@age", 25]

["and",
    ["gte?", "@age", 18],
    ["eq?", "@verified", true]
]

Easy to generate programmatically, standard JSON format, network-transmissible.

Infix DSL Syntax

# Natural infix notation
@status == "active"
@age > 25 and @verified == true
@role == "admin" or (@role == "user" and @score > 100)

Human-readable, familiar, good for CLI usage. All three compile to the same AST.

Read More