Below you will find pages that utilize the taxonomy term “Dual”
Duality: The Hidden Structure of Opposites
January 19, 2026
Many structures come in pairs. Recognizing duality lets you transfer insights between domains.
The Motivating Example
This collection includes two approaches to automatic differentiation:
- Forward mode (in dual): Propagate derivatives alongside values, from inputs toward outputs
- Reverse mode (in autodiff): Build a graph during forward evaluation, then propagate gradients backward from outputs toward inputs
These aren’t just two implementations of the same idea. They’re duals, mirror images with complementary strengths.
Forward mode computes one column of the Jacobian per pass. If \(f: \mathbb{R}^n \to \mathbb{R}^m\), computing the full Jacobian takes \(n\) passes. Reverse mode computes one row per pass, \(m\) passes for the full Jacobian.
For neural network training, we have many inputs (millions of parameters) and one output (the loss). Reverse mode wins overwhelmingly: one backward pass gives all gradients. This is why backpropagation dominates deep learning.
For sensitivity analysis with few parameters and many outputs, forward mode wins. Same algorithm structure, opposite traversal direction, complementary use cases.
The mathematical explanation: forward mode computes Jacobian-vector products (\(Jv\)); reverse mode computes vector-Jacobian products (\(v^T J\)). These are transposes of each other. Duality is transposition.
Push vs Pull
Consider two ways to traverse a sequence:
Pull (iterator/consumer controls):
for (auto it = seq.begin(); it != seq.end(); ++it) {
process(*it); // Consumer pulls each element
}
Push (producer controls):
seq.for_each([](auto x) {
process(x); // Producer pushes each element
});
Same traversal. Same elements processed. But control flow is reversed:
| Aspect | Pull (Iterator) | Push (Generator) |
|---|---|---|
| Who controls pace? | Consumer | Producer |
| Suspend/resume? | Consumer decides when to call ++ | Producer decides when to yield |
| Backpressure | Natural (just stop pulling) | Must be designed in |
| Composition | Chain iterators | Chain callbacks |
C++ ranges are pull-based: view | filter | transform creates an iterator that pulls through the pipeline. Reactive streams (Rx) are push-based: events flow through a pipeline of observers.
These are duals. Given a pull-based algorithm, you can mechanically derive its push-based counterpart by reversing who initiates each step. The transformation preserves correctness because it’s just changing direction, not content.
Encode vs Decode
Compression algorithms come in pairs:
// Encoder: structure -> bits
auto encode(const Document& doc) -> Bitstream;
// Decoder: bits -> structure
auto decode(const Bitstream& bits) -> Document;
These must be inverses: decode(encode(x)) == x. But their implementations are often strikingly different:
Seeing Structure First
January 18, 2026
A reflection on eleven explorations in generic programming
The Question Behind the Code
What do these computations have in common?
- Computing the millionth Fibonacci number
- Finding the shortest path between cities in a weighted graph
- Calculating compound interest over thirty years
- Composing ten 3D rotations into one
- Repeating a string n times
The answer: they’re all computed by the same twenty lines of code.
template<typename T>
constexpr T power(T const& base, T exp) {
if (exp == zero(exp)) return one(exp);
if (exp == one(exp)) return base;
return even(exp)
? square(power(base, half(exp)))
: product(base, power(base, decrement(exp)));
}
This shouldn’t work. Fibonacci numbers involve integer sequences. Shortest paths involve graphs. Rotations involve 3D geometry. Different domains, different mathematics.
Yet they share structure. Once you see it, a single algorithm serves them all.
This collection of eleven blog posts is an extended meditation on one idea: algorithms arise from algebraic structure. The posts cover different domains (number theory, calculus, linear algebra, polymorphism) but they circle the same insight. Recognize the structure; the algorithm follows.
The Principle
Alex Stepanov articulated this most clearly in Elements of Programming: “Generic programming is about abstracting and classifying algorithms and data structures.” But the deeper point is how to abstract. Not by common syntax or superficial similarity, but by the algebraic laws a type obeys.
Why does structure appear everywhere? Because reality has structure. The algebraic structures we discover in programming (groups, rings, monoids) are the same structures physicists discover in nature. Rotations form a group. Spacetime transformations form a group. This isn’t coincidence. We’re uncovering patterns that exist.
Noether’s theorem makes this precise: every continuous symmetry corresponds to a conservation law. Time-translation symmetry gives conservation of energy. Space-translation symmetry gives conservation of momentum. Rotational symmetry gives conservation of angular momentum. The symmetry groups of physics are algebraic structures.
When we recognize “this is a monoid” in our code, we’re tapping into the same mathematical substrate that governs physical law. The algorithms follow because the structure constrains what’s possible, both in computation and in nature.
Consider the power() function above. What does it require?
- An associative binary operation (so we can regroup: \((a \cdot b) \cdot c = a \cdot (b \cdot c)\))
- An identity element (so \(1 \cdot x = x \cdot 1 = x\))
- Halving and parity testing on the exponent
That’s it. Any type providing these operations, with these laws, can use this algorithm. The requirements are algebraic, not syntactic.
Differentiation: Three Ways
January 15, 2025
A synthesis of three earlier posts, comparing forward-mode AD, reverse-mode AD, and numerical differentiation.
Computing derivatives shows up everywhere: optimization, machine learning, physics simulation, numerical analysis. This series has explored three distinct approaches:
- Forward-mode AD via dual numbers
- Reverse-mode AD via computational graphs
- Numerical differentiation via finite differences
Each has different strengths. The right choice depends on the shape of your problem.
The Landscape
| Method | Accuracy | Cost for \(f: \mathbb{R}^n \to \mathbb{R}\) | Cost for \(f: \mathbb{R} \to \mathbb{R}^m\) | Memory |
|---|---|---|---|---|
| Forward AD | Exact | \(O(n)\) passes | \(O(1)\) pass | \(O(1)\) |
| Reverse AD | Exact | \(O(1)\) pass | \(O(m)\) passes | \(O(\text{ops})\) |
| Finite Diff | \(O(h^p)\) | \(O(n)\) evaluations | \(O(n)\) evaluations | \(O(1)\) |
The key point: problem structure determines the best method.
Forward-Mode AD: Dual Numbers
Forward-mode AD extends numbers with an infinitesimal \(\varepsilon\) where \(\varepsilon^2 = 0\). The derivative falls out of the arithmetic for free:
// f(x) = x^3 - 3x + 1
// f'(x) = 3x^2 - 3
auto x = dual<double>::variable(2.0); // x = 2, dx = 1
auto f = x*x*x - 3.0*x + 1.0;
std::cout << f.value() << "\n"; // 3.0
std::cout << f.derivative() << "\n"; // 9.0
Strengths:
- Simple implementation (operator overloading)
- No memory overhead
- Naturally composable for higher derivatives
- Works with any function of overloaded operators
When to use:
- Single input variable (or few inputs)
- Computing Jacobian-vector products
- Higher-order derivatives via nesting
- Sensitivity analysis along one direction
Complexity: One forward pass per input variable. For f: R^n -> R^m, computing the full Jacobian requires n passes.
Reverse-Mode AD: Computational Graphs
Reverse-mode AD builds a computational graph during the forward pass, then propagates gradients backward via the chain rule:
auto f = [](const auto& x) {
return sum(pow(x, 2.0)); // f(x) = sum(x^2)
};
auto df = grad(f); // Returns gradient function
auto gradient = df(x); // One backward pass for all partials
Strengths:
- O(1) backward passes regardless of input dimension
- Powers modern deep learning (backpropagation)
- Efficient for loss functions: f: R^n -> R
When to use:
- Many inputs, scalar output (neural networks)
- Computing vector-Jacobian products
- Optimization where you need the full gradient
Complexity: One forward pass to build the graph, one backward pass to compute all gradients. Memory scales with the number of operations because you have to store intermediate values.
Numerical Differentiation: Finite Differences
Approximate the derivative using the limit definition:
// Central difference: f'(x) ~ (f(x+h) - f(x-h)) / 2h
double df = central_difference(f, x);
Strengths: