Dual | metafunctor

Below you will find pages that utilize the taxonomy term “Dual”

Duality: The Hidden Structure of Opposites

January 19, 2026

Many structures come in pairs. Recognizing duality lets you transfer insights between domains.

The Motivating Example

This collection includes two approaches to automatic differentiation:

Forward mode (in dual): Propagate derivatives alongside values, from inputs toward outputs
Reverse mode (in autodiff): Build a graph during forward evaluation, then propagate gradients backward from outputs toward inputs

These aren’t just two implementations of the same idea. They’re duals, mirror images with complementary strengths.

Forward mode computes one column of the Jacobian per pass. If \(f: \mathbb{R}^n \to \mathbb{R}^m\), computing the full Jacobian takes \(n\) passes. Reverse mode computes one row per pass, \(m\) passes for the full Jacobian.

For neural network training, we have many inputs (millions of parameters) and one output (the loss). Reverse mode wins overwhelmingly: one backward pass gives all gradients. This is why backpropagation dominates deep learning.

For sensitivity analysis with few parameters and many outputs, forward mode wins. Same algorithm structure, opposite traversal direction, complementary use cases.

The mathematical explanation: forward mode computes Jacobian-vector products (\(Jv\)); reverse mode computes vector-Jacobian products (\(v^T J\)). These are transposes of each other. Duality is transposition.

Push vs Pull

Consider two ways to traverse a sequence:

Pull (iterator/consumer controls):

for (auto it = seq.begin(); it != seq.end(); ++it) {
    process(*it);  // Consumer pulls each element
}

Push (producer controls):

seq.for_each([](auto x) {
    process(x);  // Producer pushes each element
});

Same traversal. Same elements processed. But control flow is reversed:

Aspect	Pull (Iterator)	Push (Generator)
Who controls pace?	Consumer	Producer
Suspend/resume?	Consumer decides when to call `++`	Producer decides when to yield
Backpressure	Natural (just stop pulling)	Must be designed in
Composition	Chain iterators	Chain callbacks

C++ ranges are pull-based: view | filter | transform creates an iterator that pulls through the pipeline. Reactive streams (Rx) are push-based: events flow through a pipeline of observers.

These are duals. Given a pull-based algorithm, you can mechanically derive its push-based counterpart by reversing who initiates each step. The transformation preserves correctness because it’s just changing direction, not content.

Encode vs Decode

Compression algorithms come in pairs:

// Encoder: structure -> bits
auto encode(const Document& doc) -> Bitstream;

// Decoder: bits -> structure
auto decode(const Bitstream& bits) -> Document;

These must be inverses: decode(encode(x)) == x. But their implementations are often strikingly different:

Seeing Structure First

January 18, 2026

A reflection on eleven explorations in generic programming

The Question Behind the Code

What do these computations have in common?

Computing the millionth Fibonacci number
Finding the shortest path between cities in a weighted graph
Calculating compound interest over thirty years
Composing ten 3D rotations into one
Repeating a string n times

The answer: they’re all computed by the same twenty lines of code.

template<typename T>
constexpr T power(T const& base, T exp) {
    if (exp == zero(exp)) return one(exp);
    if (exp == one(exp))  return base;

    return even(exp)
        ? square(power(base, half(exp)))
        : product(base, power(base, decrement(exp)));
}

This shouldn’t work. Fibonacci numbers involve integer sequences. Shortest paths involve graphs. Rotations involve 3D geometry. Different domains, different mathematics.

Yet they share structure. Once you see it, a single algorithm serves them all.

This collection of eleven blog posts is an extended meditation on one idea: algorithms arise from algebraic structure. The posts cover different domains (number theory, calculus, linear algebra, polymorphism) but they circle the same insight. Recognize the structure; the algorithm follows.

The Principle

Alex Stepanov articulated this most clearly in Elements of Programming: “Generic programming is about abstracting and classifying algorithms and data structures.” But the deeper point is how to abstract. Not by common syntax or superficial similarity, but by the algebraic laws a type obeys.

Why does structure appear everywhere? Because reality has structure. The algebraic structures we discover in programming (groups, rings, monoids) are the same structures physicists discover in nature. Rotations form a group. Spacetime transformations form a group. This isn’t coincidence. We’re uncovering patterns that exist.

Noether’s theorem makes this precise: every continuous symmetry corresponds to a conservation law. Time-translation symmetry gives conservation of energy. Space-translation symmetry gives conservation of momentum. Rotational symmetry gives conservation of angular momentum. The symmetry groups of physics are algebraic structures.

When we recognize “this is a monoid” in our code, we’re tapping into the same mathematical substrate that governs physical law. The algorithms follow because the structure constrains what’s possible, both in computation and in nature.

Consider the power() function above. What does it require?

An associative binary operation (so we can regroup: \((a \cdot b) \cdot c = a \cdot (b \cdot c)\))
An identity element (so \(1 \cdot x = x \cdot 1 = x\))
Halving and parity testing on the exponent

That’s it. Any type providing these operations, with these laws, can use this algorithm. The requirements are algebraic, not syntactic.

Differentiation: Three Ways

January 15, 2025

A synthesis of three earlier posts, comparing forward-mode AD, reverse-mode AD, and numerical differentiation.

Computing derivatives shows up everywhere: optimization, machine learning, physics simulation, numerical analysis. This series has explored three distinct approaches:

Forward-mode AD via dual numbers
Reverse-mode AD via computational graphs
Numerical differentiation via finite differences

Each has different strengths. The right choice depends on the shape of your problem.

The Landscape

Method	Accuracy	Cost for \(f: \mathbb{R}^n \to \mathbb{R}\)	Cost for \(f: \mathbb{R} \to \mathbb{R}^m\)	Memory
Forward AD	Exact	\(O(n)\) passes	\(O(1)\) pass	\(O(1)\)
Reverse AD	Exact	\(O(1)\) pass	\(O(m)\) passes	\(O(\text{ops})\)
Finite Diff	\(O(h^p)\)	\(O(n)\) evaluations	\(O(n)\) evaluations	\(O(1)\)

The key point: problem structure determines the best method.

Forward-Mode AD: Dual Numbers

Forward-mode AD extends numbers with an infinitesimal \(\varepsilon\) where \(\varepsilon^2 = 0\). The derivative falls out of the arithmetic for free:

// f(x) = x^3 - 3x + 1
// f'(x) = 3x^2 - 3

auto x = dual<double>::variable(2.0);  // x = 2, dx = 1
auto f = x*x*x - 3.0*x + 1.0;

std::cout << f.value() << "\n";       // 3.0
std::cout << f.derivative() << "\n";  // 9.0

Strengths:

Simple implementation (operator overloading)
No memory overhead
Naturally composable for higher derivatives
Works with any function of overloaded operators

When to use:

Single input variable (or few inputs)
Computing Jacobian-vector products
Higher-order derivatives via nesting
Sensitivity analysis along one direction

Complexity: One forward pass per input variable. For f: R^n -> R^m, computing the full Jacobian requires n passes.

Reverse-Mode AD: Computational Graphs

Reverse-mode AD builds a computational graph during the forward pass, then propagates gradients backward via the chain rule:

auto f = [](const auto& x) {
    return sum(pow(x, 2.0));  // f(x) = sum(x^2)
};

auto df = grad(f);  // Returns gradient function
auto gradient = df(x);  // One backward pass for all partials

Strengths:

O(1) backward passes regardless of input dimension
Powers modern deep learning (backpropagation)
Efficient for loss functions: f: R^n -> R

When to use:

Many inputs, scalar output (neural networks)
Computing vector-Jacobian products
Optimization where you need the full gradient

Complexity: One forward pass to build the graph, one backward pass to compute all gradients. Memory scales with the number of operations because you have to store intermediate values.

Numerical Differentiation: Finite Differences

Approximate the derivative using the limit definition:

// Central difference: f'(x) ~ (f(x+h) - f(x-h)) / 2h
double df = central_difference(f, x);

Strengths: