Stepanov: Generic Programming in C++

These posts explore the philosophy and practice of generic programming as taught by Alex Stepanov, architect of the C++ Standard Template Library.

The central insight: algorithms arise from algebraic structure. The same algorithm that works on integers works on matrices, polynomials, and modular integers—not by accident, but because they share algebraic properties. When you recognize “this is a monoid” or “this is a Euclidean domain,” you immediately know which algorithms apply.

Each post demonstrates one beautiful idea with minimal, pedagogical C++ code (~100-300 lines). The implementations prioritize clarity over cleverness—code that reads like a textbook that happens to compile.

Core Topics

Monoids and exponentiation: The Russian peasant algorithm reveals the universal structure of repeated operations
Rings and fields: Modular arithmetic teaches which algorithms apply where
Euclidean domains: The same GCD algorithm works for integers and polynomials
Generic concepts: C++20 concepts express algebraic requirements as compile-time contracts

Differentiation: Three Ways

Tweet Share Share

A synthesis of three earlier posts, comparing forward-mode AD, reverse-mode AD, and numerical differentiation.

Computing derivatives is fundamental to optimization, machine learning, physics simulation, and numerical analysis. This series has explored three distinct approaches:

Forward-mode AD via dual numbers
Reverse-mode AD via computational graphs
Numerical differentiation via finite differences

Each has different strengths. Understanding when to use each is essential for practical numerical computing.

The Landscape

Method	Accuracy	Cost for f: R^n -> R	Cost for f: R -> R^m	Memory
Forward AD	Exact	O(n) passes	O(1) pass	O(1)
Reverse AD	Exact	O(1) pass	O(m) passes	O(ops)
Finite Diff	O(h^p)	O(n) evaluations	O(n) evaluations	O(1)

The key insight: problem structure determines the best method.

Forward-Mode AD: Dual Numbers

Forward-mode AD extends numbers with an infinitesimal epsilon where epsilon^2 = 0. The derivative emerges from arithmetic:

// f(x) = x^3 - 3x + 1
// f'(x) = 3x^2 - 3

auto x = dual<double>::variable(2.0);  // x = 2, dx = 1
auto f = x*x*x - 3.0*x + 1.0;

std::cout << f.value() << "\n";       // 3.0
std::cout << f.derivative() << "\n";  // 9.0

Strengths:

Simple implementation (operator overloading)
No memory overhead
Naturally composable for higher derivatives
Works with any function of overloaded operators

When to use:

Single input variable (or few inputs)
Computing Jacobian-vector products
Higher-order derivatives via nesting
Sensitivity analysis along one direction

Complexity: One forward pass per input variable. For f: R^n -> R^m, computing the full Jacobian requires n passes.

Reverse-Mode AD: Computational Graphs

Reverse-mode AD builds a computational graph during the forward pass, then propagates gradients backward via the chain rule:

auto f = [](const auto& x) {
    return sum(pow(x, 2.0));  // f(x) = sum(x^2)
};

auto df = grad(f);  // Returns gradient function
auto gradient = df(x);  // One backward pass for all partials

Strengths:

O(1) backward passes regardless of input dimension
Powers modern deep learning (backpropagation)
Efficient for loss functions: f: R^n -> R

When to use:

Many inputs, scalar output (neural networks)
Computing vector-Jacobian products
Optimization where you need full gradient

Complexity: One forward pass to build graph, one backward pass to compute all gradients. Memory scales with number of operations (must store intermediate values).

Numerical Differentiation: Finite Differences

Approximate the derivative using the limit definition:

// Central difference: f'(x) ~ (f(x+h) - f(x-h)) / 2h
double df = central_difference(f, x);

Strengths:

Works with black-box functions
No special types required
Simple to implement and understand

Limitations:

Runtime Polymorphism Without Inheritance

Tweet Share Share

Sean Parent’s technique for value-semantic polymorphism, and algebraic extensions

The Problem

You want a container that holds different types:

std::vector<???> items;
items.push_back(42);
items.push_back(std::string("hello"));
items.push_back(MyCustomType{});

Traditional OOP says: make everything inherit from a base class. But that’s intrusive—you can’t add int to your hierarchy.

The Solution: Type Erasure

Store the interface in an abstract class, but wrap any type that provides the needed operations:

class any_regular {
    struct concept_t {
        virtual ~concept_t() = default;
        virtual std::unique_ptr<concept_t> clone() const = 0;
        virtual bool equals(concept_t const&) const = 0;
    };

    template<typename T>
    struct model_t : concept_t {
        T value;
        // Implement concept_t using T's operations
    };

    std::unique_ptr<concept_t> self_;
};

The magic: model_t<int>, model_t<string>, model_t<MyType> are all different types, but they all inherit from concept_t. We’ve moved the inheritance inside the wrapper.

The Pattern

concept_t: Abstract base defining required operations
model_t: Template that wraps any type T and implements concept_t
Wrapper class: Holds unique_ptr<concept_t>, provides value semantics

The wrapper looks like a value but can hold any type:

any_regular a = 42;           // Stores model_t<int>
any_regular b = "hello"s;     // Stores model_t<string>
any_regular c = a;            // Deep copy via clone()

Value Semantics

The key is clone(). Copying the wrapper deep-copies the held value:

any_regular(any_regular const& other)
    : self_(other.self_ ? other.self_->clone() : nullptr) {}

This gives us value semantics: copies are independent, assignment replaces content, no shared state surprises.

Beyond Regular: Algebraic Interfaces

Here’s where Stepanov’s insight applies. What if we type-erase algebraic structure?

any_ring: Erasing Ring Operations

class any_ring {
    struct concept_t {
        virtual std::unique_ptr<concept_t> clone() const = 0;
        virtual std::unique_ptr<concept_t> add(concept_t const&) const = 0;
        virtual std::unique_ptr<concept_t> multiply(concept_t const&) const = 0;
        virtual std::unique_ptr<concept_t> negate() const = 0;
        virtual std::unique_ptr<concept_t> zero() const = 0;
        virtual std::unique_ptr<concept_t> one() const = 0;
        virtual bool equals(concept_t const&) const = 0;
    };
    // ...
};

Now any_ring can hold integers, polynomials, matrices, or any ring—and you can do arithmetic on it at runtime:

any_ring a = 42;
any_ring b = polynomial{1, 2, 3};  // Different types!
// a + b would throw (type mismatch), but:
any_ring c = a + any_ring(10);     // Works: 52

any_vector_space: Linear Algebra Abstraction

A vector space over a field F has:

Vector addition: v + w
Scalar multiplication: alpha * v
Zero vector: 0

template<typename Scalar>
class any_vector {
    struct concept_t {
        virtual std::unique_ptr<concept_t> clone() const = 0;
        virtual std::unique_ptr<concept_t> add(concept_t const&) const = 0;
        virtual std::unique_ptr<concept_t> scale(Scalar) const = 0;
        virtual std::unique_ptr<concept_t> zero() const = 0;
        virtual Scalar inner_product(concept_t const&) const = 0;
    };
    // ...
};

This lets you write algorithms that work on any vector space:

// Gram-Schmidt orthogonalization - works on any inner product space
template<typename Scalar>
std::vector<any_vector<Scalar>> gram_schmidt(
    std::vector<any_vector<Scalar>> const& basis
) {
    std::vector<any_vector<Scalar>> orthonormal;
    for (auto const& v : basis) {
        auto u = v;
        for (auto const& e : orthonormal) {
            // u = u - <u,e>*e
            u = u - e.scale(u.inner_product(e));
        }
        // Normalize u
        Scalar norm = std::sqrt(u.inner_product(u));
        orthonormal.push_back(u.scale(1.0 / norm));
    }
    return orthonormal;
}

The Stepanov Connection

This is Stepanov’s approach at the type level:

Identify the algebraic structure (ring, vector space, etc.)
Define operations axiomatically (not by inheritance)
Let any type that provides the operations participate

The concept/model pattern makes this work at runtime. It’s the same philosophy as C++20 concepts, but with runtime dispatch instead of compile-time.

Numerical Integration with Generic Concepts

Tweet Share Share

Numerical integration (quadrature) for C++20.

Overview

The definite integral represents the signed area under a curve:

integral from a to b of f(x) dx

Since most functions lack closed-form antiderivatives, we approximate integrals numerically using quadrature rules—weighted sums of function evaluations:

integral from a to b of f(x) dx ~ sum of w_i * f(x_i)

Different rules choose different nodes x_i and weights w_i, trading off accuracy and computational cost.

Quick Start

#include <integration/integrate.hpp>
#include <cmath>
#include <iostream>

int main() {
    using namespace integration;

    // integral from 0 to pi of sin(x) dx = 2
    double result = integrate([](double x) { return std::sin(x); }, 0.0, 3.14159265);

    std::cout << "integral of sin(x) dx = " << result << "\n";  // ~2.0
}

The Quadrature Zoo

Basic Rules

Rule	Formula	Error	Exact for
Midpoint	(b-a)*f(m)	O(h^3)	Linear
Trapezoidal	(b-a)/2*(f(a)+f(b))	O(h^3)	Linear
Simpson’s	(b-a)/6*(f(a)+4f(m)+f(b))	O(h^5)	Cubic

double m = midpoint_rule(f, a, b);
double t = trapezoidal_rule(f, a, b);
double s = simpsons_rule(f, a, b);

Composite Rules

Divide [a,b] into n subintervals and apply the basic rule to each:

// Error: O(h^2) where h = (b-a)/n
double m = composite_midpoint(f, a, b, 100);
double t = composite_trapezoidal(f, a, b, 100);

// Error: O(h^4) - much more accurate!
double s = composite_simpsons(f, a, b, 100);  // n must be even

Gauss-Legendre Quadrature

The optimal choice: n points exactly integrate polynomials of degree 2n-1.

// 5-point Gauss-Legendre: exact for degree <= 9
double g = gauss_legendre<5>(f, a, b);

Adaptive Integration

Automatically refines where the function is “difficult”:

// Recommended for general use
double result = integrate(f, a, b);              // Default tolerance 1e-10
double result = integrate(f, a, b, 1e-12);       // Custom tolerance

// With error estimate
auto [value, error, evals] = integrate_with_error(f, a, b);

Deriving Simpson’s Rule

Taylor expand f(x) around the midpoint m = (a+b)/2. Odd powers vanish by symmetry when integrating from a to b:

integral from a to b of f(x) dx ~ (b-a)f(m) + f''(m)(b-a)^3/24 + O(h^5)

Simpson’s rule is the unique combination of endpoint and midpoint values that cancels the h^2 error:

integral from a to b of f(x) dx = (b-a)/6 * [f(a) + 4f(m) + f(b)] + O(h^5)

Remarkably, this also cancels the h^3 term—Simpson gets a “bonus degree.”

Why Gauss-Legendre is Optimal

For n evaluation points, we have 2n free parameters (n nodes + n weights). We can match 2n conditions: exact integration of 1, x, x^2, …, x^{2n-1}.

The solution: nodes are roots of the n-th Legendre polynomial P_n(x). These orthogonal polynomials arise naturally from the optimization.

Generic Programming: The Stepanov Way

This module follows the Stepanov principle: algorithms arise from minimal operations.

Reverse-Mode Automatic Differentiation

Tweet Share Share
Demystifying automatic differentiation: building a reverse-mode AD system in C++20.

Automatic differentiation powers modern machine learning. PyTorch’s autograd, JAX’s jit, TensorFlow’s GradientTape—all rely on the same fundamental insight: we can compute exact derivatives by mechanically applying the chain rule.

The Functional API

The core interface is a single function: grad. It takes a function and returns its gradient function:
```
auto f = [](const auto& x) {
    return sum(pow(x, 2.0));  // f(x) = sum(x^2)
};

auto df = grad(f);  // df(x) = 2x

matrix<double> x{{1.0}, {2.0}, {3.0}};
auto gradient = df(x);  // Returns {{2.0}, {4.0}, {6.0}}
```
Because grad returns a callable, you can compose it:
```
auto d2f = grad(grad(f));  // Second derivative
```
No classes to instantiate, no global state to manage. Just functions transforming functions.

How Reverse-Mode AD Works

The grad function reveals the algorithm:
```
template <typename F>
auto grad(F&& f) {
    return [f = std::forward<F>(f)]<typename T>(const T& input) {
        graph g;                          // 1. Fresh graph
        auto x = g.make_var(input);       // 2. Create input variable
        auto y = f(x);                    // 3. Forward pass (builds graph)
        g.backward<output_type>(y);       // 4. Backward pass
        return g.gradient(x);             // 5. Extract gradient
    };
}
```
Forward pass: When f(x) executes, each operation creates a node in the graph. The node stores:
- The computed value
- References to parent nodes
- A backward function implementing that operation’s Jacobian
Backward pass: Starting from the output, we traverse the graph in reverse topological order. Each node’s backward function receives the upstream gradient and propagates it to parents via the chain rule.

Implementing Jacobians

Each differentiable operation provides a backward function. Let’s examine three matrix calculus results.

Matrix Multiplication: C = AB

If the loss is L, and C appears in L, we need dL/dA and dL/dB given dL/dC.

The chain rule gives us:
- dL/dA = (dL/dC) * B^T
- dL/dB = A^T * (dL/dC)
Determinant: d = det(A)

This is a classic matrix calculus result:

d(det A)/dA = det(A) * A^{-T}

where A^{-T} denotes the transpose of the inverse.

Matrix Inverse: B = A^{-1}

Starting from A * A^{-1} = I and differentiating implicitly:

dA * A^{-1} + A * dA^{-1} = 0 dA^{-1} = -A^{-1} * dA * A^{-1}

Applying the chain rule to a scalar loss L through B = A^{-1}:

dL/dA = -A^{-T} * (dL/dB) * A^{-T}

Gradient Accumulation

When a variable appears multiple times in a computation, gradients accumulate:
```
void accumulate_gradient(node_id id, const T& grad) {
    auto& n = nodes_.at(id.index);
    if (!n.gradient.has_value()) {
        n.gradient = grad;
    } else {
        auto& existing = std::any_cast<T&>(n.gradient);
        existing = existing + grad;
    }
}
```
For example, if y = x + x, then dy/dx = 2, not 1. Each use of x contributes its gradient.
Read More →

Numerical Differentiation

Tweet Share Share

Numerical differentiation via finite differences for C++20.

Overview

The derivative is defined as a limit:

f'(x) = lim_{h->0} (f(x+h) - f(x)) / h

We cannot take h = 0 on a computer, but we can choose small h. The art of numerical differentiation lies in choosing h wisely—small enough that the approximation is good, but not so small that floating-point errors dominate.

Quick Start

#include <finite-diff/finite_diff.hpp>
#include <cmath>
#include <iostream>

int main() {
    using namespace finite_diff;

    // Derivative of sin(x) at x = 1
    auto f = [](double x) { return std::sin(x); };

    double x = 1.0;
    double df = central_difference(f, x);  // Uses optimal h

    std::cout << "f'(1) = " << df << "\n";         // ~0.5403
    std::cout << "cos(1) = " << std::cos(1.0) << "\n";  // 0.5403...
}

The Two Errors

Every finite difference approximation has two sources of error:

Truncation error: From the Taylor series approximation. Decreases as h -> 0.
Round-off error: From floating-point arithmetic. Increases as h -> 0.

Total error = Truncation + Round-off
            = O(h^p) + O(eps/h)

where eps is approximately 2.2 x 10^-16 (machine epsilon for double).

The optimal h minimizes total error:

Method	Truncation	Optimal h	Typical accuracy
Forward	O(h)	sqrt(eps)	~8 digits
Central	O(h^2)	eps^(1/3)	~10 digits
Five-point	O(h^4)	eps^(1/5)	~12 digits

API Reference

First Derivatives

// Forward difference: (f(x+h) - f(x)) / h
// O(h) accuracy
double df = forward_difference(f, x);

// Backward difference: (f(x) - f(x-h)) / h
// O(h) accuracy
double df = backward_difference(f, x);

// Central difference: (f(x+h) - f(x-h)) / (2h)
// O(h^2) accuracy - the workhorse
double df = central_difference(f, x);

// Five-point stencil: higher accuracy formula
// O(h^4) accuracy
double df = five_point_stencil(f, x);

Second Derivatives

// Second derivative: (f(x+h) - 2f(x) + f(x-h)) / h^2
double d2f = second_derivative(f, x);

// Five-point second derivative: O(h^4)
double d2f = second_derivative_five_point(f, x);

Multivariate Functions

// Gradient: grad(f)(x) for f: R^n -> R
auto grad = gradient(f, x);

// Directional derivative: D_v f(x)
double Dvf = directional_derivative(f, x, v);

// Hessian matrix: H[i][j] = d^2f/dx_i dx_j
auto H = hessian(f, x);

// Jacobian matrix for f: R^n -> R^m
auto J = jacobian(f, x);

Deriving the Formulas

Forward Difference

Taylor expand f(x+h):

f(x+h) = f(x) + h*f'(x) + (h^2/2)*f''(x) + O(h^3)

Solve for f’(x):

f'(x) = (f(x+h) - f(x))/h - (h/2)*f''(x) + O(h^2)
      = (f(x+h) - f(x))/h + O(h)

Central Difference

Taylor expand f(x+h) and f(x-h):

f(x+h) = f(x) + h*f'(x) + (h^2/2)*f''(x) + (h^3/6)*f'''(x) + O(h^4)
f(x-h) = f(x) - h*f'(x) + (h^2/2)*f''(x) - (h^3/6)*f'''(x) + O(h^4)

Subtract:

Forward-Mode Automatic Differentiation

Tweet Share Share

Forward-mode automatic differentiation via dual numbers for C++20.

Overview

Dual numbers are a simple yet powerful technique for computing exact derivatives. The key insight: if we extend our number system with an element epsilon where epsilon^2 = 0, then evaluating f(x + epsilon) yields f(x) + epsilon * f'(x). The derivative emerges automatically from the algebra.

Quick Start

#include <dual/dual.hpp>
#include <iostream>

int main() {
    using namespace dual;

    // Create a dual variable at x = 2
    auto x = dual<double>::variable(2.0);

    // Compute f(x) = x^3 - 3x + 1
    auto f = x*x*x - 3.0*x + 1.0;

    std::cout << "f(2) = " << f.value() << "\n";       // 3.0
    std::cout << "f'(2) = " << f.derivative() << "\n"; // 9.0
}

The Mathematics

A dual number has the form a + b*epsilon where epsilon^2 = 0. Arithmetic follows naturally:

(a + b*eps) + (c + d*eps) = (a+c) + (b+d)*eps
(a + b*eps) * (c + d*eps) = ac + (ad + bc)*eps + bd*eps^2
                          = ac + (ad + bc)*eps

Notice how the bd*eps^2 term vanishes because eps^2 = 0.

For a function f, Taylor expansion gives:

f(a + b*eps) = f(a) + b*f'(a)*eps + (b^2/2)*f''(a)*eps^2 + ...
             = f(a) + b*f'(a)*eps

If we set b = 1 (marking x as “the variable we’re differentiating with respect to”), then:

f(x + eps) = f(x) + f'(x)*eps

The derivative appears as the coefficient of epsilon!

API Reference

dual

The core dual number type.

// Create a variable for differentiation
auto x = dual<double>::variable(3.0);  // x = 3, dx = 1

// Create a constant
auto c = dual<double>::constant(2.0);  // c = 2, dc = 0

// Access values
double val = x.value();       // 3.0
double deriv = x.derivative(); // 1.0

// Arithmetic operators: +, -, *, /
auto y = sin(x*x) + exp(-x);

// Convenience function
auto [value, deriv] = differentiate([](auto x) { return x*x; }, 3.0);

Mathematical Functions

All standard math functions are supported with correct derivative propagation:

Basic: sqrt, cbrt, abs
Exponential: exp, exp2, expm1, log, log2, log10, log1p
Trigonometric: sin, cos, tan, asin, acos, atan, atan2
Hyperbolic: sinh, cosh, tanh, asinh, acosh, atanh
Power: pow, hypot
Special: erf, erfc

Higher-Order Derivatives

Second derivatives with dual2:

auto result = differentiate2([](auto x) { return sin(x); }, 1.0);
// result.value  = sin(1)
// result.first  = cos(1)
// result.second = -sin(1)

Arbitrary order with jets:

// Compute f, f', f'', f''', f'''' at x = 1
auto derivs = derivatives<4>([](auto x) { return exp(x); }, 1.0);
// All derivatives of e^x at x=1 equal e

Forward vs Reverse Mode

This library implements forward mode AD:

Teaching Linear Algebra with C++20 Concepts

Tweet Share Share
Why build yet another linear algebra library? The world already has Eigen, Armadillo, Blaze, and countless others. The answer lies not in performance or features, but in pedagogy.

elementa exists to teach three things simultaneously: linear algebra, modern C++, and numerical computing. Every design decision prioritizes clarity over cleverness. The code reads like a textbook that happens to compile.

The Matrix Concept

C++20 concepts let us express “what a matrix is” as a compile-time contract. Here’s the core definition:
```
template <typename M>
concept Matrix = requires(M m, const M cm, std::size_t i, std::size_t j) {
    typename M::scalar_type;

    { cm.rows() } -> std::same_as<std::size_t>;
    { cm.cols() } -> std::same_as<std::size_t>;

    { m(i, j) } -> std::same_as<typename M::scalar_type&>;
    { cm(i, j) } -> std::same_as<const typename M::scalar_type&>;

    { cm + cm } -> std::same_as<M>;
    { cm - cm } -> std::same_as<M>;
    { -cm } -> std::same_as<M>;
};
```
This concept says: a type M is a Matrix if it has a scalar_type, dimension queries, element access (both mutable and const), and basic arithmetic. Notice what’s absent: scalar multiplication. This omission avoids circular constraint issues with the operator* overload for matrix multiplication. Instead, we provide a scale() function for generic code.

The beauty of concepts is that any type satisfying these constraints works with our algorithms—no inheritance required. We can write:
```
template <Matrix M>
auto det(const M& A) -> typename M::scalar_type;
```
and this function works for matrix<double>, matrix<float>, or any future type satisfying Matrix.

Clean API Design

A pedagogical library needs an ergonomic API. elementa provides multiple ways to construct matrices:
```
// Default: empty 0x0
matrix<double> empty;

// Filled with value
matrix<double> zeros(3, 4, 0.0);  // 3x4 of zeros

// Flat initializer list (row-major)
matrix<double> flat(2, 3, {1, 2, 3, 4, 5, 6});

// Nested initializer list (most natural)
matrix<double> natural{{1, 2, 3},
                       {4, 5, 6}};
```
Value semantics are enforced throughout. Operators like + and - return new matrices, marked [[nodiscard]] to prevent accidental discards.

LU Decomposition: The Heart of the Library

LU decomposition factors a matrix A into a lower triangular L and upper triangular U such that PA = LU, where P is a permutation matrix capturing row swaps. This single decomposition powers determinants, inverses, and linear system solving.

The implementation uses partial pivoting: at each step, we find the largest absolute value in the current column to use as the pivot. This prevents division by small numbers that would amplify rounding errors.
```
template <Arithmetic T>
struct lu_result {
    matrix<T> L;                     // Lower triangular (unit diagonal)
    matrix<T> U;                     // Upper triangular
    std::vector<std::size_t> perm;   // Permutation vector
    int sign;                        // Sign of permutation (+1 or -1)
    bool singular;                   // True if matrix is singular
};
```
Derived Operations

Once you have LU, everything else follows.
Read More →

Polynomials as Euclidean Domains

Tweet Share Share

Polynomial arithmetic teaches a profound truth: the same GCD algorithm works for integers and polynomials.

Overview

Both integers and polynomials are examples of Euclidean domains—algebraic structures where division with remainder is always possible.

// For integers: gcd(48, 18) = 6
// For polynomials: gcd(x^3 - 1, x^2 - 1) = x - 1

// Same algorithm, different types!
template<euclidean_domain E>
E gcd(E a, E b) {
    while (b != E(0)) {
        a = std::exchange(b, a % b);
    }
    return a;
}

Quick Start

#include <polynomials/polynomial.hpp>
#include <iostream>

using namespace poly;

int main() {
    // Create polynomial x^2 - 1 = (x-1)(x+1)
    auto p = polynomial<double>{-1, 0, 1};

    // Create polynomial x^3 - 1 = (x-1)(x^2+x+1)
    auto q = polynomial<double>{-1, 0, 0, 1};

    // GCD should be (x - 1)
    auto g = gcd(p, q);

    std::cout << "gcd(x^2-1, x^3-1) has degree " << g.degree() << "\n";  // 1

    // Find roots of x^2 - 1
    auto roots = find_roots(p, -10.0, 10.0);
    for (double r : roots) {
        std::cout << "Root: " << r << "\n";  // -1 and 1
    }
}

API Reference

Creating Polynomials

// From dense coefficients (a[i] = coefficient of x^i)
polynomial<double> p{1, -2, 1};  // 1 - 2x + x^2

// Monomial: coefficient * x^degree
auto m = polynomial<double>::monomial(3.0, 4);  // 3x^4

// The variable x
auto x = polynomial<double>::x();  // x

// Constant
polynomial<double> c{5.0};  // 5

Arithmetic

auto sum = p + q;
auto diff = p - q;
auto prod = p * q;
auto [quot, rem] = divmod(p, q);  // Division with remainder
auto quot_only = p / q;
auto rem_only = p % q;

auto g = gcd(p, q);                    // Greatest common divisor
auto [g, s, t] = extended_gcd(p, q);   // Bezout: g = p*s + q*t
auto l = lcm(p, q);                    // Least common multiple
bool d = divides(p, q);                // Does p divide q?

Evaluation and Calculus

double val = evaluate(p, x);           // p(x)
auto dp = derivative(p);               // p'(x)
auto integral = antiderivative(p);     // integral of p
auto roots = find_roots(p, -10, 10);   // All real roots in interval
auto crit = stationary_points(p, -10, 10);  // Where p'(x) = 0

The Euclidean Domain Insight

What makes polynomials special is that they form a Euclidean domain:

Property	Integers	Polynomials
Norm	abs(n)	degree(p)
Division	a = b*q + r, abs(r) < abs(b)	a = b*q + r, deg(r) < deg(b)
GCD	gcd(48, 18) = 6	gcd(x^2-1, x-1) = x-1

The same Euclidean algorithm works for both because they share this structure.

Sparse Representation

Polynomials are stored as sorted (degree, coefficient) pairs. This is efficient for sparse polynomials like x^1000 + 1 (only 2 terms stored, not 1001).

Why This Matters

The Euclidean domain insight exemplifies Stepanov’s philosophy: algorithms arise from algebraic structure. When you recognize that polynomials and integers share the same abstract structure, you immediately know that:

GCD works
Extended GCD works (Bezout’s identity)
Unique factorization holds
The Chinese Remainder Theorem applies

One structure, many types, same algorithms.

Exact Rational Arithmetic

Tweet Share Share
Fractions that never lose precision

The Problem with Floating-Point
```
double x = 0.1 + 0.2;
std::cout << (x == 0.3);  // Prints 0 (false!)
```
Floating-point represents numbers as m x 2^e. The number 0.1 has no exact finite representation in binary, just as 1/3 has no exact finite representation in decimal.

Rational arithmetic solves this: 1/3 stays exactly 1/3.

The Implementation

A rational is a pair (numerator, denominator) kept in lowest terms:
```
template<std::integral T>
class rat {
    T num_;  // numerator (carries sign)
    T den_;  // denominator (always positive)

    void reduce() {
        T g = std::gcd(abs(num_), den_);
        num_ /= g;
        den_ /= g;
    }
};
```
Key invariants:
1. Denominator is always positive (sign lives in numerator)
2. GCD(|num|, den) = 1 (always reduced)
3. Zero is uniquely 0/1
Arithmetic

Addition requires a common denominator:
```
a/b + c/d = (ad + bc) / bd
```
Then reduce. The implementation:
```
rat operator+(rat const& rhs) const {
    return rat(num_ * rhs.den_ + rhs.num_ * den_,
               den_ * rhs.den_);
}
```
The constructor automatically reduces.

Multiplication is simpler:
```
(a/b) x (c/d) = ac / bd
```
Division multiplies by the reciprocal:
```
(a/b) / (c/d) = ad / bc
```
Exact Comparison

No floating-point fuzziness. Two reduced rationals are equal iff their numerators and denominators match:
```
bool operator==(rat const& rhs) const {
    return num_ == rhs.num_ && den_ == rhs.den_;
}
```
For ordering, cross-multiply (valid when denominators are positive):
```
a/b < c/d  iff  a x d < c x b
```
The Mediant

The mediant of a/b and c/d is (a+c)/(b+d). Unlike the average, it’s not halfway between, but it has remarkable properties:
```
rat mediant(rat const& a, rat const& b) {
    return rat(a.numerator() + b.numerator(),
               a.denominator() + b.denominator());
}
```
Properties:
- If a/b < c/d, then a/b < mediant < c/d
- The mediant is always in lowest terms if a/b and c/d are neighbors in the Stern-Brocot tree
- Mediants generate all positive rationals exactly once
Stern-Brocot Tree

Start with 0/1 and 1/0 (infinity). Repeatedly take mediants:
```
Level 0:     0/1                     1/0
Level 1:     0/1       1/1           1/0
Level 2:     0/1   1/2   1/1   2/1   1/0
Level 3:  0/1 1/3 1/2 2/3 1/1 3/2 2/1 3/1 1/0
```
Every positive rational appears exactly once. This connects to:
- Continued fractions (path from root encodes CF)
- Best rational approximations
- Farey sequences
GCD: The Unifying Thread

Reducing fractions uses GCD. The GCD algorithm is:
```
T gcd(T a, T b) {
    while (b != 0) {
        a = a % b;
        std::swap(a, b);
    }
    return a;
}
```
This is Euclid’s algorithm from ~300 BCE—the same algorithm that appears for any Euclidean domain.

Rational numbers form a field: every non-zero element has a multiplicative inverse (the reciprocal). The denominator requirement (non-zero, reduced) comes from the algebraic structure.
Read More →

Is It Prime?

Tweet Share Share

The Miller-Rabin primality test and the mathematics of certainty

The Problem

Given a large number n, is it prime? Trial division up to sqrt(n) is too slow for cryptographic-sized numbers. We need something faster—and we’re willing to accept “probably prime” with quantifiable certainty.

Fermat’s Little Theorem

For prime p and any a not divisible by p:

a^(p-1) = 1 (mod p)

This suggests a test: pick random a, compute a^(n-1) mod n. If the result isn’t 1, n is definitely composite. But if it is 1, n might be prime… or might be a Carmichael number that fools this test.

The Miller-Rabin Improvement

Miller and Rabin observed something stronger. For odd prime p, write p-1 = 2^r x d (factor out all 2s). Then the sequence:

a^d, a^(2d), a^(4d), ..., a^(2^r x d) = a^(p-1)

must either:

Start with 1, or
Contain -1 (i.e., p-1) somewhere before reaching 1

Why? Because the only square roots of 1 mod p are plus or minus 1. If we ever see 1 without first seeing -1, we’ve found a non-trivial square root of 1, proving n is composite.

The Witness Test

bool witness_test(int64_t n, int64_t a) {
    // Write n-1 = 2^r x d
    int64_t d = n - 1;
    int r = 0;
    while ((d & 1) == 0) { d >>= 1; r++; }

    // Compute x = a^d mod n
    int64_t x = mod_pow(a, d, n);

    if (x == 1 || x == n - 1) return true;  // Probably prime

    // Square r-1 times, looking for n-1
    for (int i = 1; i < r; i++) {
        x = (x * x) % n;
        if (x == n - 1) return true;
    }

    return false;  // Definitely composite
}

If witness_test(n, a) returns false, n is definitely composite—a is a “witness” to compositeness.

Error Bounds

The beautiful part: for any composite n, at least 3/4 of all possible witnesses a in [2, n-2] will detect it. This means each random witness has at most 1/4 chance of failing to detect a composite.

With k independent witnesses:

P(false positive) <= (1/4)^k

Witnesses	Error bound
10	< 10^-6
20	< 10^-12
40	< 10^-24

Parameterizing by Error

Rather than asking “how many iterations?”, ask “what error rate is acceptable?”:

bool is_prime(int64_t n, double max_error = 1e-12) {
    int k = ceil(-log(max_error) / log(4));
    // ... test with k random witnesses
}

The implementation also returns the actual error bound achieved:

auto result = is_prime_with_error(n);
if (result.probably_prime) {
    std::cout << "Prime with error < " << result.error_bound << "\n";
}

Modular Exponentiation

The workhorse is computing a^e mod n efficiently. The peasant algorithm again:

Core Topics

Further Reading

The Landscape

Forward-Mode AD: Dual Numbers

Reverse-Mode AD: Computational Graphs

Numerical Differentiation: Finite Differences

The Problem

The Solution: Type Erasure

The Pattern

Value Semantics

Beyond Regular: Algebraic Interfaces

any_ring: Erasing Ring Operations

any_vector_space: Linear Algebra Abstraction

The Stepanov Connection

Overview

Quick Start

The Quadrature Zoo

Basic Rules

Composite Rules

Gauss-Legendre Quadrature

Adaptive Integration

Deriving Simpson’s Rule

Why Gauss-Legendre is Optimal

Generic Programming: The Stepanov Way

The Functional API

How Reverse-Mode AD Works

Implementing Jacobians

Matrix Multiplication: C = AB

Determinant: d = det(A)

Matrix Inverse: B = A^{-1}

Gradient Accumulation

Overview

Quick Start

The Two Errors

API Reference

First Derivatives

Second Derivatives

Multivariate Functions

Deriving the Formulas

Forward Difference

Central Difference

Overview

Quick Start

The Mathematics

API Reference

dual

Mathematical Functions

Higher-Order Derivatives

Forward vs Reverse Mode

The Matrix Concept

Clean API Design

LU Decomposition: The Heart of the Library

Derived Operations

Overview

Quick Start

API Reference

Creating Polynomials

Arithmetic

GCD and Related

Evaluation and Calculus

The Euclidean Domain Insight

Sparse Representation

Why This Matters

The Problem with Floating-Point

The Implementation

Arithmetic

Exact Comparison

The Mediant

Stern-Brocot Tree

GCD: The Unifying Thread

The Problem

Fermat’s Little Theorem

The Miller-Rabin Improvement

The Witness Test

Error Bounds

Parameterizing by Error

Modular Exponentiation