Building Languages to Solve Problems

January 19, 2026

Chapter 4 of Structure and Interpretation of Computer Programs opens with one of the most important insights in programming: the most powerful technique for controlling complexity is metalinguistic abstraction, the establishment of new languages.

Not libraries. Not frameworks. Languages.

When you’ve abstracted enough of a problem domain into primitives, combination rules, and naming mechanisms, you haven’t just written code. You’ve created a new way of thinking about the problem. The domain becomes expressible. And once something is expressible, it becomes manipulable, debuggable, and shareable.

What Is Metalinguistic Abstraction?

The key distinction is between using a language and creating one. A library gives you functions to call. A language gives you a grammar for expressing ideas.

Consider the difference:

Library approach: Call db.execute("SELECT * FROM users WHERE age > 21")

Language approach: Write SELECT * FROM users WHERE age > 21

SQL isn’t a library. It’s a language, with primitives (tables, columns), means of combination (joins, unions, subqueries), and means of abstraction (views, CTEs). These three elements (primitives, combination, abstraction) are SICP’s fundamental criteria for any language, and they’re what separates a DSL from a mere API.

Other examples:

  • Regular expressions: primitives (characters, character classes), combination (concatenation, alternation), abstraction (groups, backreferences)
  • Make: primitives (targets, prerequisites), combination (dependency chains), abstraction (pattern rules, variables)
  • CSS selectors: primitives (elements, classes, IDs), combination (descendant, child, sibling), abstraction (custom properties, mixins in preprocessors)

In each case, the language captures the essential structure of the problem domain in a way that raw code cannot.

The Three Requirements

SICP identifies three necessary components for any language:

  1. Primitives: What are the basic elements that cannot be broken down further?
  2. Means of combination: How do you build compound elements from simpler ones?
  3. Means of abstraction: How do you name and reuse patterns?

When designing a DSL, these questions guide everything. Get them wrong and you have a clunky API. Get them right and the domain becomes thinkable in your language.

Consider an expression language for symbolic math:

  • Primitives: numbers, symbols, operators
  • Combination: function application (+ x 1), nested expressions (* (+ x 1) 2)
  • Abstraction: named rules, rulesets, engines

Or a query language for JSON documents:

Read More

How Iterators Give You N+M Instead of NxM

November 15, 2019

The problem is combinatorial. You have N algorithms (sort, search, find, copy) and M containers (array, list, tree, hash table). The naive approach: implement each algorithm for each container. That is NxM implementations.

The insight is to interpose an abstraction layer.

The Iterator Abstraction

Instead of algorithms knowing about containers directly, we define iterator categories, capabilities that algorithms require and containers provide:

Input: Single-pass read. You can advance (++) and dereference (*), but once you move forward, you cannot go back. Stream-like.

Forward: Multi-pass. You can iterate multiple times; begin() always gives the same starting point.

Bidirectional: Can go backward (--). Enables algorithms like reverse iteration.

Random-access: Can jump anywhere (+n, []). Enables binary search, sorting.

This is a hierarchy of requirements. Each level adds capabilities and enables more algorithms. An algorithm declares the weakest category it needs, and any container providing at least that category works.

A True Input Iterator

The input iterator category exists for a reason. Here is a working example that reads entropy from /dev/urandom:

#include <fstream>
#include <iterator>
#include <cstdint>

struct entropy_iterator {
    using iterator_category = std::input_iterator_tag;
    using value_type        = uint8_t;
    using difference_type   = std::ptrdiff_t;
    using pointer           = const uint8_t*;
    using reference         = uint8_t;  // returns by value, not reference

    std::ifstream* source = nullptr;
    uint8_t byte = 0;

    entropy_iterator() = default;  // sentinel (end iterator)

    explicit entropy_iterator(std::ifstream& s) : source(&s) {
        ++(*this);  // prime the first byte
    }

    uint8_t operator*() const { return byte; }

    entropy_iterator& operator++() {
        if (source && source->good()) {
            source->read(reinterpret_cast<char*>(&byte), 1);
            if (!source->good()) source = nullptr;
        }
        return *this;
    }

    entropy_iterator operator++(int) {
        auto tmp = *this;
        ++(*this);
        return tmp;
    }

    bool operator==(const entropy_iterator& other) const {
        return source == other.source;
    }
};

Use it like any input iterator:

int main() {
    std::ifstream urandom("/dev/urandom", std::ios::binary);
    entropy_iterator it(urandom);

    // generate 16 random bytes
    std::vector<uint8_t> key(16);
    std::copy_n(it, 16, key.begin());

    // or use in algorithms
    int sum = 0;
    for (int i = 0; i < 1000; ++i, ++it) {
        sum += *it;
    }
    // sum ≈ 127500 (mean of uniform [0,255] × 1000)
}

Each ++ consumes a fresh entropy byte from the kernel. You literally cannot iterate twice over the same sequence. This is why the input iterator category exists: some sources are inherently single-pass. Claiming forward iterator capabilities would be a lie.

The same pattern applies to network streams, sensor readings, and any source where data is consumed by reading it.

The Payoff

Now binary_search does not need to know about vectors, deques, or sorted arrays. It only needs random-access iterators. The algorithm expresses its requirements; the container provides capabilities. They compose through the iterator abstraction.

Read More