These posts explore the philosophy and practice of generic programming as taught by Alex Stepanov, architect of the C++ Standard Template Library.

The central insight: algorithms arise from algebraic structure. The same algorithm that works on integers works on matrices, polynomials, and modular integers—not by accident, but because they share algebraic properties. When you recognize “this is a monoid” or “this is a Euclidean domain,” you immediately know which algorithms apply.

Each post demonstrates one beautiful idea with minimal, pedagogical C++ code (~100-300 lines). The implementations prioritize clarity over cleverness—code that reads like a textbook that happens to compile.

Core Topics

  • Monoids and exponentiation: The Russian peasant algorithm reveals the universal structure of repeated operations
  • Rings and fields: Modular arithmetic teaches which algorithms apply where
  • Euclidean domains: The same GCD algorithm works for integers and polynomials
  • Generic concepts: C++20 concepts express algebraic requirements as compile-time contracts

Further Reading

  • Stepanov & Rose, From Mathematics to Generic Programming
  • Stepanov & McJones, Elements of Programming
  • A synthesis of three earlier posts, comparing forward-mode AD, reverse-mode AD, and numerical differentiation.

    Computing derivatives is fundamental to optimization, machine learning, physics simulation, and numerical analysis. This series has explored three distinct approaches:

    1. Forward-mode AD via dual numbers
    2. Reverse-mode AD via computational graphs
    3. Numerical differentiation via finite differences

    Each has different strengths. Understanding when to use each is essential for practical numerical computing.

    The Landscape

    Method Accuracy Cost for f: R^n -> R Cost for f: R -> R^m Memory
    Forward AD Exact O(n) passes O(1) pass O(1)
    Reverse AD Exact O(1) pass O(m) passes O(ops)
    Finite Diff O(h^p) O(n) evaluations O(n) evaluations O(1)

    The key insight: problem structure determines the best method.

    Forward-Mode AD: Dual Numbers

    Forward-mode AD extends numbers with an infinitesimal epsilon where epsilon^2 = 0. The derivative emerges from arithmetic:

    // f(x) = x^3 - 3x + 1
    // f'(x) = 3x^2 - 3
    
    auto x = dual<double>::variable(2.0);  // x = 2, dx = 1
    auto f = x*x*x - 3.0*x + 1.0;
    
    std::cout << f.value() << "\n";       // 3.0
    std::cout << f.derivative() << "\n";  // 9.0
    

    Strengths:

    • Simple implementation (operator overloading)
    • No memory overhead
    • Naturally composable for higher derivatives
    • Works with any function of overloaded operators

    When to use:

    • Single input variable (or few inputs)
    • Computing Jacobian-vector products
    • Higher-order derivatives via nesting
    • Sensitivity analysis along one direction

    Complexity: One forward pass per input variable. For f: R^n -> R^m, computing the full Jacobian requires n passes.

    Reverse-Mode AD: Computational Graphs

    Reverse-mode AD builds a computational graph during the forward pass, then propagates gradients backward via the chain rule:

    auto f = [](const auto& x) {
        return sum(pow(x, 2.0));  // f(x) = sum(x^2)
    };
    
    auto df = grad(f);  // Returns gradient function
    auto gradient = df(x);  // One backward pass for all partials
    

    Strengths:

    • O(1) backward passes regardless of input dimension
    • Powers modern deep learning (backpropagation)
    • Efficient for loss functions: f: R^n -> R

    When to use:

    • Many inputs, scalar output (neural networks)
    • Computing vector-Jacobian products
    • Optimization where you need full gradient

    Complexity: One forward pass to build graph, one backward pass to compute all gradients. Memory scales with number of operations (must store intermediate values).

    Numerical Differentiation: Finite Differences

    Approximate the derivative using the limit definition:

    // Central difference: f'(x) ~ (f(x+h) - f(x-h)) / 2h
    double df = central_difference(f, x);
    

    Strengths:

    • Works with black-box functions
    • No special types required
    • Simple to implement and understand

    Limitations:

    Read More →
  • Sean Parent’s technique for value-semantic polymorphism, and algebraic extensions

    The Problem

    You want a container that holds different types:

    std::vector<???> items;
    items.push_back(42);
    items.push_back(std::string("hello"));
    items.push_back(MyCustomType{});
    

    Traditional OOP says: make everything inherit from a base class. But that’s intrusive—you can’t add int to your hierarchy.

    The Solution: Type Erasure

    Store the interface in an abstract class, but wrap any type that provides the needed operations:

    class any_regular {
        struct concept_t {
            virtual ~concept_t() = default;
            virtual std::unique_ptr<concept_t> clone() const = 0;
            virtual bool equals(concept_t const&) const = 0;
        };
    
        template<typename T>
        struct model_t : concept_t {
            T value;
            // Implement concept_t using T's operations
        };
    
        std::unique_ptr<concept_t> self_;
    };
    

    The magic: model_t<int>, model_t<string>, model_t<MyType> are all different types, but they all inherit from concept_t. We’ve moved the inheritance inside the wrapper.

    The Pattern

    1. concept_t: Abstract base defining required operations
    2. model_t: Template that wraps any type T and implements concept_t
    3. Wrapper class: Holds unique_ptr<concept_t>, provides value semantics

    The wrapper looks like a value but can hold any type:

    any_regular a = 42;           // Stores model_t<int>
    any_regular b = "hello"s;     // Stores model_t<string>
    any_regular c = a;            // Deep copy via clone()
    

    Value Semantics

    The key is clone(). Copying the wrapper deep-copies the held value:

    any_regular(any_regular const& other)
        : self_(other.self_ ? other.self_->clone() : nullptr) {}
    

    This gives us value semantics: copies are independent, assignment replaces content, no shared state surprises.

    Beyond Regular: Algebraic Interfaces

    Here’s where Stepanov’s insight applies. What if we type-erase algebraic structure?

    any_ring: Erasing Ring Operations

    class any_ring {
        struct concept_t {
            virtual std::unique_ptr<concept_t> clone() const = 0;
            virtual std::unique_ptr<concept_t> add(concept_t const&) const = 0;
            virtual std::unique_ptr<concept_t> multiply(concept_t const&) const = 0;
            virtual std::unique_ptr<concept_t> negate() const = 0;
            virtual std::unique_ptr<concept_t> zero() const = 0;
            virtual std::unique_ptr<concept_t> one() const = 0;
            virtual bool equals(concept_t const&) const = 0;
        };
        // ...
    };
    

    Now any_ring can hold integers, polynomials, matrices, or any ring—and you can do arithmetic on it at runtime:

    any_ring a = 42;
    any_ring b = polynomial{1, 2, 3};  // Different types!
    // a + b would throw (type mismatch), but:
    any_ring c = a + any_ring(10);     // Works: 52
    

    any_vector_space: Linear Algebra Abstraction

    A vector space over a field F has:

    • Vector addition: v + w
    • Scalar multiplication: alpha * v
    • Zero vector: 0
    template<typename Scalar>
    class any_vector {
        struct concept_t {
            virtual std::unique_ptr<concept_t> clone() const = 0;
            virtual std::unique_ptr<concept_t> add(concept_t const&) const = 0;
            virtual std::unique_ptr<concept_t> scale(Scalar) const = 0;
            virtual std::unique_ptr<concept_t> zero() const = 0;
            virtual Scalar inner_product(concept_t const&) const = 0;
        };
        // ...
    };
    

    This lets you write algorithms that work on any vector space:

    // Gram-Schmidt orthogonalization - works on any inner product space
    template<typename Scalar>
    std::vector<any_vector<Scalar>> gram_schmidt(
        std::vector<any_vector<Scalar>> const& basis
    ) {
        std::vector<any_vector<Scalar>> orthonormal;
        for (auto const& v : basis) {
            auto u = v;
            for (auto const& e : orthonormal) {
                // u = u - <u,e>*e
                u = u - e.scale(u.inner_product(e));
            }
            // Normalize u
            Scalar norm = std::sqrt(u.inner_product(u));
            orthonormal.push_back(u.scale(1.0 / norm));
        }
        return orthonormal;
    }
    

    The Stepanov Connection

    This is Stepanov’s approach at the type level:

    1. Identify the algebraic structure (ring, vector space, etc.)
    2. Define operations axiomatically (not by inheritance)
    3. Let any type that provides the operations participate

    The concept/model pattern makes this work at runtime. It’s the same philosophy as C++20 concepts, but with runtime dispatch instead of compile-time.

    Read More →
  • Numerical integration (quadrature) for C++20.

    Overview

    The definite integral represents the signed area under a curve:

    integral from a to b of f(x) dx
    

    Since most functions lack closed-form antiderivatives, we approximate integrals numerically using quadrature rules—weighted sums of function evaluations:

    integral from a to b of f(x) dx ~ sum of w_i * f(x_i)
    

    Different rules choose different nodes x_i and weights w_i, trading off accuracy and computational cost.

    Quick Start

    #include <integration/integrate.hpp>
    #include <cmath>
    #include <iostream>
    
    int main() {
        using namespace integration;
    
        // integral from 0 to pi of sin(x) dx = 2
        double result = integrate([](double x) { return std::sin(x); }, 0.0, 3.14159265);
    
        std::cout << "integral of sin(x) dx = " << result << "\n";  // ~2.0
    }
    

    The Quadrature Zoo

    Basic Rules

    Rule Formula Error Exact for
    Midpoint (b-a)*f(m) O(h^3) Linear
    Trapezoidal (b-a)/2*(f(a)+f(b)) O(h^3) Linear
    Simpson’s (b-a)/6*(f(a)+4f(m)+f(b)) O(h^5) Cubic
    double m = midpoint_rule(f, a, b);
    double t = trapezoidal_rule(f, a, b);
    double s = simpsons_rule(f, a, b);
    

    Composite Rules

    Divide [a,b] into n subintervals and apply the basic rule to each:

    // Error: O(h^2) where h = (b-a)/n
    double m = composite_midpoint(f, a, b, 100);
    double t = composite_trapezoidal(f, a, b, 100);
    
    // Error: O(h^4) - much more accurate!
    double s = composite_simpsons(f, a, b, 100);  // n must be even
    

    Gauss-Legendre Quadrature

    The optimal choice: n points exactly integrate polynomials of degree 2n-1.

    // 5-point Gauss-Legendre: exact for degree <= 9
    double g = gauss_legendre<5>(f, a, b);
    

    Adaptive Integration

    Automatically refines where the function is “difficult”:

    // Recommended for general use
    double result = integrate(f, a, b);              // Default tolerance 1e-10
    double result = integrate(f, a, b, 1e-12);       // Custom tolerance
    
    // With error estimate
    auto [value, error, evals] = integrate_with_error(f, a, b);
    

    Deriving Simpson’s Rule

    Taylor expand f(x) around the midpoint m = (a+b)/2. Odd powers vanish by symmetry when integrating from a to b:

    integral from a to b of f(x) dx ~ (b-a)f(m) + f''(m)(b-a)^3/24 + O(h^5)
    

    Simpson’s rule is the unique combination of endpoint and midpoint values that cancels the h^2 error:

    integral from a to b of f(x) dx = (b-a)/6 * [f(a) + 4f(m) + f(b)] + O(h^5)
    

    Remarkably, this also cancels the h^3 term—Simpson gets a “bonus degree.”

    Why Gauss-Legendre is Optimal

    For n evaluation points, we have 2n free parameters (n nodes + n weights). We can match 2n conditions: exact integration of 1, x, x^2, …, x^{2n-1}.

    The solution: nodes are roots of the n-th Legendre polynomial P_n(x). These orthogonal polynomials arise naturally from the optimization.

    Generic Programming: The Stepanov Way

    This module follows the Stepanov principle: algorithms arise from minimal operations.

    Read More →
  • Demystifying automatic differentiation: building a reverse-mode AD system in C++20.

    Automatic differentiation powers modern machine learning. PyTorch’s autograd, JAX’s jit, TensorFlow’s GradientTape—all rely on the same fundamental insight: we can compute exact derivatives by mechanically applying the chain rule.

    The Functional API

    The core interface is a single function: grad. It takes a function and returns its gradient function:

    auto f = [](const auto& x) {
        return sum(pow(x, 2.0));  // f(x) = sum(x^2)
    };
    
    auto df = grad(f);  // df(x) = 2x
    
    matrix<double> x{{1.0}, {2.0}, {3.0}};
    auto gradient = df(x);  // Returns {{2.0}, {4.0}, {6.0}}
    

    Because grad returns a callable, you can compose it:

    auto d2f = grad(grad(f));  // Second derivative
    

    No classes to instantiate, no global state to manage. Just functions transforming functions.

    How Reverse-Mode AD Works

    The grad function reveals the algorithm:

    template <typename F>
    auto grad(F&& f) {
        return [f = std::forward<F>(f)]<typename T>(const T& input) {
            graph g;                          // 1. Fresh graph
            auto x = g.make_var(input);       // 2. Create input variable
            auto y = f(x);                    // 3. Forward pass (builds graph)
            g.backward<output_type>(y);       // 4. Backward pass
            return g.gradient(x);             // 5. Extract gradient
        };
    }
    

    Forward pass: When f(x) executes, each operation creates a node in the graph. The node stores:

    • The computed value
    • References to parent nodes
    • A backward function implementing that operation’s Jacobian

    Backward pass: Starting from the output, we traverse the graph in reverse topological order. Each node’s backward function receives the upstream gradient and propagates it to parents via the chain rule.

    Implementing Jacobians

    Each differentiable operation provides a backward function. Let’s examine three matrix calculus results.

    Matrix Multiplication: C = AB

    If the loss is L, and C appears in L, we need dL/dA and dL/dB given dL/dC.

    The chain rule gives us:

    • dL/dA = (dL/dC) * B^T
    • dL/dB = A^T * (dL/dC)

    Determinant: d = det(A)

    This is a classic matrix calculus result:

    d(det A)/dA = det(A) * A^{-T}

    where A^{-T} denotes the transpose of the inverse.

    Matrix Inverse: B = A^{-1}

    Starting from A * A^{-1} = I and differentiating implicitly:

    dA * A^{-1} + A * dA^{-1} = 0 dA^{-1} = -A^{-1} * dA * A^{-1}

    Applying the chain rule to a scalar loss L through B = A^{-1}:

    dL/dA = -A^{-T} * (dL/dB) * A^{-T}

    Gradient Accumulation

    When a variable appears multiple times in a computation, gradients accumulate:

    void accumulate_gradient(node_id id, const T& grad) {
        auto& n = nodes_.at(id.index);
        if (!n.gradient.has_value()) {
            n.gradient = grad;
        } else {
            auto& existing = std::any_cast<T&>(n.gradient);
            existing = existing + grad;
        }
    }
    

    For example, if y = x + x, then dy/dx = 2, not 1. Each use of x contributes its gradient.

    Read More →
  • Numerical differentiation via finite differences for C++20.

    Overview

    The derivative is defined as a limit:

    f'(x) = lim_{h->0} (f(x+h) - f(x)) / h
    

    We cannot take h = 0 on a computer, but we can choose small h. The art of numerical differentiation lies in choosing h wisely—small enough that the approximation is good, but not so small that floating-point errors dominate.

    Quick Start

    #include <finite-diff/finite_diff.hpp>
    #include <cmath>
    #include <iostream>
    
    int main() {
        using namespace finite_diff;
    
        // Derivative of sin(x) at x = 1
        auto f = [](double x) { return std::sin(x); };
    
        double x = 1.0;
        double df = central_difference(f, x);  // Uses optimal h
    
        std::cout << "f'(1) = " << df << "\n";         // ~0.5403
        std::cout << "cos(1) = " << std::cos(1.0) << "\n";  // 0.5403...
    }
    

    The Two Errors

    Every finite difference approximation has two sources of error:

    1. Truncation error: From the Taylor series approximation. Decreases as h -> 0.
    2. Round-off error: From floating-point arithmetic. Increases as h -> 0.
    Total error = Truncation + Round-off
                = O(h^p) + O(eps/h)
    

    where eps is approximately 2.2 x 10^-16 (machine epsilon for double).

    The optimal h minimizes total error:

    Method Truncation Optimal h Typical accuracy
    Forward O(h) sqrt(eps) ~8 digits
    Central O(h^2) eps^(1/3) ~10 digits
    Five-point O(h^4) eps^(1/5) ~12 digits

    API Reference

    First Derivatives

    // Forward difference: (f(x+h) - f(x)) / h
    // O(h) accuracy
    double df = forward_difference(f, x);
    
    // Backward difference: (f(x) - f(x-h)) / h
    // O(h) accuracy
    double df = backward_difference(f, x);
    
    // Central difference: (f(x+h) - f(x-h)) / (2h)
    // O(h^2) accuracy - the workhorse
    double df = central_difference(f, x);
    
    // Five-point stencil: higher accuracy formula
    // O(h^4) accuracy
    double df = five_point_stencil(f, x);
    

    Second Derivatives

    // Second derivative: (f(x+h) - 2f(x) + f(x-h)) / h^2
    double d2f = second_derivative(f, x);
    
    // Five-point second derivative: O(h^4)
    double d2f = second_derivative_five_point(f, x);
    

    Multivariate Functions

    // Gradient: grad(f)(x) for f: R^n -> R
    auto grad = gradient(f, x);
    
    // Directional derivative: D_v f(x)
    double Dvf = directional_derivative(f, x, v);
    
    // Hessian matrix: H[i][j] = d^2f/dx_i dx_j
    auto H = hessian(f, x);
    
    // Jacobian matrix for f: R^n -> R^m
    auto J = jacobian(f, x);
    

    Deriving the Formulas

    Forward Difference

    Taylor expand f(x+h):

    f(x+h) = f(x) + h*f'(x) + (h^2/2)*f''(x) + O(h^3)
    

    Solve for f’(x):

    f'(x) = (f(x+h) - f(x))/h - (h/2)*f''(x) + O(h^2)
          = (f(x+h) - f(x))/h + O(h)
    

    Central Difference

    Taylor expand f(x+h) and f(x-h):

    f(x+h) = f(x) + h*f'(x) + (h^2/2)*f''(x) + (h^3/6)*f'''(x) + O(h^4)
    f(x-h) = f(x) - h*f'(x) + (h^2/2)*f''(x) - (h^3/6)*f'''(x) + O(h^4)
    

    Subtract:

    Read More →
  • Forward-mode automatic differentiation via dual numbers for C++20.

    Overview

    Dual numbers are a simple yet powerful technique for computing exact derivatives. The key insight: if we extend our number system with an element epsilon where epsilon^2 = 0, then evaluating f(x + epsilon) yields f(x) + epsilon * f'(x). The derivative emerges automatically from the algebra.

    Quick Start

    #include <dual/dual.hpp>
    #include <iostream>
    
    int main() {
        using namespace dual;
    
        // Create a dual variable at x = 2
        auto x = dual<double>::variable(2.0);
    
        // Compute f(x) = x^3 - 3x + 1
        auto f = x*x*x - 3.0*x + 1.0;
    
        std::cout << "f(2) = " << f.value() << "\n";       // 3.0
        std::cout << "f'(2) = " << f.derivative() << "\n"; // 9.0
    }
    

    The Mathematics

    A dual number has the form a + b*epsilon where epsilon^2 = 0. Arithmetic follows naturally:

    (a + b*eps) + (c + d*eps) = (a+c) + (b+d)*eps
    (a + b*eps) * (c + d*eps) = ac + (ad + bc)*eps + bd*eps^2
                              = ac + (ad + bc)*eps
    

    Notice how the bd*eps^2 term vanishes because eps^2 = 0.

    For a function f, Taylor expansion gives:

    f(a + b*eps) = f(a) + b*f'(a)*eps + (b^2/2)*f''(a)*eps^2 + ...
                 = f(a) + b*f'(a)*eps
    

    If we set b = 1 (marking x as “the variable we’re differentiating with respect to”), then:

    f(x + eps) = f(x) + f'(x)*eps
    

    The derivative appears as the coefficient of epsilon!

    API Reference

    dual

    The core dual number type.

    // Create a variable for differentiation
    auto x = dual<double>::variable(3.0);  // x = 3, dx = 1
    
    // Create a constant
    auto c = dual<double>::constant(2.0);  // c = 2, dc = 0
    
    // Access values
    double val = x.value();       // 3.0
    double deriv = x.derivative(); // 1.0
    
    // Arithmetic operators: +, -, *, /
    auto y = sin(x*x) + exp(-x);
    
    // Convenience function
    auto [value, deriv] = differentiate([](auto x) { return x*x; }, 3.0);
    

    Mathematical Functions

    All standard math functions are supported with correct derivative propagation:

    • Basic: sqrt, cbrt, abs
    • Exponential: exp, exp2, expm1, log, log2, log10, log1p
    • Trigonometric: sin, cos, tan, asin, acos, atan, atan2
    • Hyperbolic: sinh, cosh, tanh, asinh, acosh, atanh
    • Power: pow, hypot
    • Special: erf, erfc

    Higher-Order Derivatives

    Second derivatives with dual2:

    auto result = differentiate2([](auto x) { return sin(x); }, 1.0);
    // result.value  = sin(1)
    // result.first  = cos(1)
    // result.second = -sin(1)
    

    Arbitrary order with jets:

    // Compute f, f', f'', f''', f'''' at x = 1
    auto derivs = derivatives<4>([](auto x) { return exp(x); }, 1.0);
    // All derivatives of e^x at x=1 equal e
    

    Forward vs Reverse Mode

    This library implements forward mode AD:

    Read More →
  • Why build yet another linear algebra library? The world already has Eigen, Armadillo, Blaze, and countless others. The answer lies not in performance or features, but in pedagogy.

    elementa exists to teach three things simultaneously: linear algebra, modern C++, and numerical computing. Every design decision prioritizes clarity over cleverness. The code reads like a textbook that happens to compile.

    The Matrix Concept

    C++20 concepts let us express “what a matrix is” as a compile-time contract. Here’s the core definition:

    template <typename M>
    concept Matrix = requires(M m, const M cm, std::size_t i, std::size_t j) {
        typename M::scalar_type;
    
        { cm.rows() } -> std::same_as<std::size_t>;
        { cm.cols() } -> std::same_as<std::size_t>;
    
        { m(i, j) } -> std::same_as<typename M::scalar_type&>;
        { cm(i, j) } -> std::same_as<const typename M::scalar_type&>;
    
        { cm + cm } -> std::same_as<M>;
        { cm - cm } -> std::same_as<M>;
        { -cm } -> std::same_as<M>;
    };
    

    This concept says: a type M is a Matrix if it has a scalar_type, dimension queries, element access (both mutable and const), and basic arithmetic. Notice what’s absent: scalar multiplication. This omission avoids circular constraint issues with the operator* overload for matrix multiplication. Instead, we provide a scale() function for generic code.

    The beauty of concepts is that any type satisfying these constraints works with our algorithms—no inheritance required. We can write:

    template <Matrix M>
    auto det(const M& A) -> typename M::scalar_type;
    

    and this function works for matrix<double>, matrix<float>, or any future type satisfying Matrix.

    Clean API Design

    A pedagogical library needs an ergonomic API. elementa provides multiple ways to construct matrices:

    // Default: empty 0x0
    matrix<double> empty;
    
    // Filled with value
    matrix<double> zeros(3, 4, 0.0);  // 3x4 of zeros
    
    // Flat initializer list (row-major)
    matrix<double> flat(2, 3, {1, 2, 3, 4, 5, 6});
    
    // Nested initializer list (most natural)
    matrix<double> natural{{1, 2, 3},
                           {4, 5, 6}};
    

    Value semantics are enforced throughout. Operators like + and - return new matrices, marked [[nodiscard]] to prevent accidental discards.

    LU Decomposition: The Heart of the Library

    LU decomposition factors a matrix A into a lower triangular L and upper triangular U such that PA = LU, where P is a permutation matrix capturing row swaps. This single decomposition powers determinants, inverses, and linear system solving.

    The implementation uses partial pivoting: at each step, we find the largest absolute value in the current column to use as the pivot. This prevents division by small numbers that would amplify rounding errors.

    template <Arithmetic T>
    struct lu_result {
        matrix<T> L;                     // Lower triangular (unit diagonal)
        matrix<T> U;                     // Upper triangular
        std::vector<std::size_t> perm;   // Permutation vector
        int sign;                        // Sign of permutation (+1 or -1)
        bool singular;                   // True if matrix is singular
    };
    

    Derived Operations

    Once you have LU, everything else follows.

    Read More →
  • Polynomial arithmetic teaches a profound truth: the same GCD algorithm works for integers and polynomials.

    Overview

    Both integers and polynomials are examples of Euclidean domains—algebraic structures where division with remainder is always possible.

    // For integers: gcd(48, 18) = 6
    // For polynomials: gcd(x^3 - 1, x^2 - 1) = x - 1
    
    // Same algorithm, different types!
    template<euclidean_domain E>
    E gcd(E a, E b) {
        while (b != E(0)) {
            a = std::exchange(b, a % b);
        }
        return a;
    }
    

    Quick Start

    #include <polynomials/polynomial.hpp>
    #include <iostream>
    
    using namespace poly;
    
    int main() {
        // Create polynomial x^2 - 1 = (x-1)(x+1)
        auto p = polynomial<double>{-1, 0, 1};
    
        // Create polynomial x^3 - 1 = (x-1)(x^2+x+1)
        auto q = polynomial<double>{-1, 0, 0, 1};
    
        // GCD should be (x - 1)
        auto g = gcd(p, q);
    
        std::cout << "gcd(x^2-1, x^3-1) has degree " << g.degree() << "\n";  // 1
    
        // Find roots of x^2 - 1
        auto roots = find_roots(p, -10.0, 10.0);
        for (double r : roots) {
            std::cout << "Root: " << r << "\n";  // -1 and 1
        }
    }
    

    API Reference

    Creating Polynomials

    // From dense coefficients (a[i] = coefficient of x^i)
    polynomial<double> p{1, -2, 1};  // 1 - 2x + x^2
    
    // Monomial: coefficient * x^degree
    auto m = polynomial<double>::monomial(3.0, 4);  // 3x^4
    
    // The variable x
    auto x = polynomial<double>::x();  // x
    
    // Constant
    polynomial<double> c{5.0};  // 5
    

    Arithmetic

    auto sum = p + q;
    auto diff = p - q;
    auto prod = p * q;
    auto [quot, rem] = divmod(p, q);  // Division with remainder
    auto quot_only = p / q;
    auto rem_only = p % q;
    
    auto g = gcd(p, q);                    // Greatest common divisor
    auto [g, s, t] = extended_gcd(p, q);   // Bezout: g = p*s + q*t
    auto l = lcm(p, q);                    // Least common multiple
    bool d = divides(p, q);                // Does p divide q?
    

    Evaluation and Calculus

    double val = evaluate(p, x);           // p(x)
    auto dp = derivative(p);               // p'(x)
    auto integral = antiderivative(p);     // integral of p
    auto roots = find_roots(p, -10, 10);   // All real roots in interval
    auto crit = stationary_points(p, -10, 10);  // Where p'(x) = 0
    

    The Euclidean Domain Insight

    What makes polynomials special is that they form a Euclidean domain:

    Property Integers Polynomials
    Norm abs(n) degree(p)
    Division a = b*q + r, abs(r) < abs(b) a = b*q + r, deg(r) < deg(b)
    GCD gcd(48, 18) = 6 gcd(x^2-1, x-1) = x-1

    The same Euclidean algorithm works for both because they share this structure.

    Sparse Representation

    Polynomials are stored as sorted (degree, coefficient) pairs. This is efficient for sparse polynomials like x^1000 + 1 (only 2 terms stored, not 1001).

    Why This Matters

    The Euclidean domain insight exemplifies Stepanov’s philosophy: algorithms arise from algebraic structure. When you recognize that polynomials and integers share the same abstract structure, you immediately know that:

    • GCD works
    • Extended GCD works (Bezout’s identity)
    • Unique factorization holds
    • The Chinese Remainder Theorem applies

    One structure, many types, same algorithms.

  • Fractions that never lose precision

    The Problem with Floating-Point

    double x = 0.1 + 0.2;
    std::cout << (x == 0.3);  // Prints 0 (false!)
    

    Floating-point represents numbers as m x 2^e. The number 0.1 has no exact finite representation in binary, just as 1/3 has no exact finite representation in decimal.

    Rational arithmetic solves this: 1/3 stays exactly 1/3.

    The Implementation

    A rational is a pair (numerator, denominator) kept in lowest terms:

    template<std::integral T>
    class rat {
        T num_;  // numerator (carries sign)
        T den_;  // denominator (always positive)
    
        void reduce() {
            T g = std::gcd(abs(num_), den_);
            num_ /= g;
            den_ /= g;
        }
    };
    

    Key invariants:

    1. Denominator is always positive (sign lives in numerator)
    2. GCD(|num|, den) = 1 (always reduced)
    3. Zero is uniquely 0/1

    Arithmetic

    Addition requires a common denominator:

    a/b + c/d = (ad + bc) / bd
    

    Then reduce. The implementation:

    rat operator+(rat const& rhs) const {
        return rat(num_ * rhs.den_ + rhs.num_ * den_,
                   den_ * rhs.den_);
    }
    

    The constructor automatically reduces.

    Multiplication is simpler:

    (a/b) x (c/d) = ac / bd
    

    Division multiplies by the reciprocal:

    (a/b) / (c/d) = ad / bc
    

    Exact Comparison

    No floating-point fuzziness. Two reduced rationals are equal iff their numerators and denominators match:

    bool operator==(rat const& rhs) const {
        return num_ == rhs.num_ && den_ == rhs.den_;
    }
    

    For ordering, cross-multiply (valid when denominators are positive):

    a/b < c/d  iff  a x d < c x b
    

    The Mediant

    The mediant of a/b and c/d is (a+c)/(b+d). Unlike the average, it’s not halfway between, but it has remarkable properties:

    rat mediant(rat const& a, rat const& b) {
        return rat(a.numerator() + b.numerator(),
                   a.denominator() + b.denominator());
    }
    

    Properties:

    • If a/b < c/d, then a/b < mediant < c/d
    • The mediant is always in lowest terms if a/b and c/d are neighbors in the Stern-Brocot tree
    • Mediants generate all positive rationals exactly once

    Stern-Brocot Tree

    Start with 0/1 and 1/0 (infinity). Repeatedly take mediants:

    Level 0:     0/1                     1/0
    Level 1:     0/1       1/1           1/0
    Level 2:     0/1   1/2   1/1   2/1   1/0
    Level 3:  0/1 1/3 1/2 2/3 1/1 3/2 2/1 3/1 1/0
    

    Every positive rational appears exactly once. This connects to:

    • Continued fractions (path from root encodes CF)
    • Best rational approximations
    • Farey sequences

    GCD: The Unifying Thread

    Reducing fractions uses GCD. The GCD algorithm is:

    T gcd(T a, T b) {
        while (b != 0) {
            a = a % b;
            std::swap(a, b);
        }
        return a;
    }
    

    This is Euclid’s algorithm from ~300 BCE—the same algorithm that appears for any Euclidean domain.

    Rational numbers form a field: every non-zero element has a multiplicative inverse (the reciprocal). The denominator requirement (non-zero, reduced) comes from the algebraic structure.

    Read More →
  • The Miller-Rabin primality test and the mathematics of certainty

    The Problem

    Given a large number n, is it prime? Trial division up to sqrt(n) is too slow for cryptographic-sized numbers. We need something faster—and we’re willing to accept “probably prime” with quantifiable certainty.

    Fermat’s Little Theorem

    For prime p and any a not divisible by p:

    a^(p-1) = 1 (mod p)
    

    This suggests a test: pick random a, compute a^(n-1) mod n. If the result isn’t 1, n is definitely composite. But if it is 1, n might be prime… or might be a Carmichael number that fools this test.

    The Miller-Rabin Improvement

    Miller and Rabin observed something stronger. For odd prime p, write p-1 = 2^r x d (factor out all 2s). Then the sequence:

    a^d, a^(2d), a^(4d), ..., a^(2^r x d) = a^(p-1)
    

    must either:

    1. Start with 1, or
    2. Contain -1 (i.e., p-1) somewhere before reaching 1

    Why? Because the only square roots of 1 mod p are plus or minus 1. If we ever see 1 without first seeing -1, we’ve found a non-trivial square root of 1, proving n is composite.

    The Witness Test

    bool witness_test(int64_t n, int64_t a) {
        // Write n-1 = 2^r x d
        int64_t d = n - 1;
        int r = 0;
        while ((d & 1) == 0) { d >>= 1; r++; }
    
        // Compute x = a^d mod n
        int64_t x = mod_pow(a, d, n);
    
        if (x == 1 || x == n - 1) return true;  // Probably prime
    
        // Square r-1 times, looking for n-1
        for (int i = 1; i < r; i++) {
            x = (x * x) % n;
            if (x == n - 1) return true;
        }
    
        return false;  // Definitely composite
    }
    

    If witness_test(n, a) returns false, n is definitely composite—a is a “witness” to compositeness.

    Error Bounds

    The beautiful part: for any composite n, at least 3/4 of all possible witnesses a in [2, n-2] will detect it. This means each random witness has at most 1/4 chance of failing to detect a composite.

    With k independent witnesses:

    P(false positive) <= (1/4)^k
    
    Witnesses Error bound
    10 < 10^-6
    20 < 10^-12
    40 < 10^-24

    Parameterizing by Error

    Rather than asking “how many iterations?”, ask “what error rate is acceptable?”:

    bool is_prime(int64_t n, double max_error = 1e-12) {
        int k = ceil(-log(max_error) / log(4));
        // ... test with k random witnesses
    }
    

    The implementation also returns the actual error bound achieved:

    auto result = is_prime_with_error(n);
    if (result.probably_prime) {
        std::cout << "Prime with error < " << result.error_bound << "\n";
    }
    

    Modular Exponentiation

    The workhorse is computing a^e mod n efficiently. The peasant algorithm again:

    Read More →