These posts explore the philosophy and practice of generic programming as taught by Alex Stepanov, architect of the C++ Standard Template Library.
The central insight: algorithms arise from algebraic structure. The same algorithm that works on integers works on matrices, polynomials, and modular integers—not by accident, but because they share algebraic properties. When you recognize “this is a monoid” or “this is a Euclidean domain,” you immediately know which algorithms apply.
Each post demonstrates one beautiful idea with minimal, pedagogical C++ code (~100-300 lines). The implementations prioritize clarity over cleverness—code that reads like a textbook that happens to compile.
Core Topics
- Monoids and exponentiation: The Russian peasant algorithm reveals the universal structure of repeated operations
- Rings and fields: Modular arithmetic teaches which algorithms apply where
- Euclidean domains: The same GCD algorithm works for integers and polynomials
- Generic concepts: C++20 concepts express algebraic requirements as compile-time contracts
Further Reading
- Stepanov & Rose, From Mathematics to Generic Programming
- Stepanov & McJones, Elements of Programming
-
Read More →
A synthesis of three earlier posts, comparing forward-mode AD, reverse-mode AD, and numerical differentiation.
Computing derivatives is fundamental to optimization, machine learning, physics simulation, and numerical analysis. This series has explored three distinct approaches:
- Forward-mode AD via dual numbers
- Reverse-mode AD via computational graphs
- Numerical differentiation via finite differences
Each has different strengths. Understanding when to use each is essential for practical numerical computing.
The Landscape
Method Accuracy Cost for f: R^n -> R Cost for f: R -> R^m Memory Forward AD Exact O(n) passes O(1) pass O(1) Reverse AD Exact O(1) pass O(m) passes O(ops) Finite Diff O(h^p) O(n) evaluations O(n) evaluations O(1) The key insight: problem structure determines the best method.
Forward-Mode AD: Dual Numbers
Forward-mode AD extends numbers with an infinitesimal epsilon where epsilon^2 = 0. The derivative emerges from arithmetic:
// f(x) = x^3 - 3x + 1 // f'(x) = 3x^2 - 3 auto x = dual<double>::variable(2.0); // x = 2, dx = 1 auto f = x*x*x - 3.0*x + 1.0; std::cout << f.value() << "\n"; // 3.0 std::cout << f.derivative() << "\n"; // 9.0Strengths:
- Simple implementation (operator overloading)
- No memory overhead
- Naturally composable for higher derivatives
- Works with any function of overloaded operators
When to use:
- Single input variable (or few inputs)
- Computing Jacobian-vector products
- Higher-order derivatives via nesting
- Sensitivity analysis along one direction
Complexity: One forward pass per input variable. For f: R^n -> R^m, computing the full Jacobian requires n passes.
Reverse-Mode AD: Computational Graphs
Reverse-mode AD builds a computational graph during the forward pass, then propagates gradients backward via the chain rule:
auto f = [](const auto& x) { return sum(pow(x, 2.0)); // f(x) = sum(x^2) }; auto df = grad(f); // Returns gradient function auto gradient = df(x); // One backward pass for all partialsStrengths:
- O(1) backward passes regardless of input dimension
- Powers modern deep learning (backpropagation)
- Efficient for loss functions: f: R^n -> R
When to use:
- Many inputs, scalar output (neural networks)
- Computing vector-Jacobian products
- Optimization where you need full gradient
Complexity: One forward pass to build graph, one backward pass to compute all gradients. Memory scales with number of operations (must store intermediate values).
Numerical Differentiation: Finite Differences
Approximate the derivative using the limit definition:
// Central difference: f'(x) ~ (f(x+h) - f(x-h)) / 2h double df = central_difference(f, x);Strengths:
- Works with black-box functions
- No special types required
- Simple to implement and understand
Limitations:
-
Read More →
Sean Parent’s technique for value-semantic polymorphism, and algebraic extensions
The Problem
You want a container that holds different types:
std::vector<???> items; items.push_back(42); items.push_back(std::string("hello")); items.push_back(MyCustomType{});Traditional OOP says: make everything inherit from a base class. But that’s intrusive—you can’t add
intto your hierarchy.The Solution: Type Erasure
Store the interface in an abstract class, but wrap any type that provides the needed operations:
class any_regular { struct concept_t { virtual ~concept_t() = default; virtual std::unique_ptr<concept_t> clone() const = 0; virtual bool equals(concept_t const&) const = 0; }; template<typename T> struct model_t : concept_t { T value; // Implement concept_t using T's operations }; std::unique_ptr<concept_t> self_; };The magic:
model_t<int>,model_t<string>,model_t<MyType>are all different types, but they all inherit fromconcept_t. We’ve moved the inheritance inside the wrapper.The Pattern
- concept_t: Abstract base defining required operations
- model_t
: Template that wraps any type T and implements concept_t - Wrapper class: Holds
unique_ptr<concept_t>, provides value semantics
The wrapper looks like a value but can hold any type:
any_regular a = 42; // Stores model_t<int> any_regular b = "hello"s; // Stores model_t<string> any_regular c = a; // Deep copy via clone()Value Semantics
The key is
clone(). Copying the wrapper deep-copies the held value:any_regular(any_regular const& other) : self_(other.self_ ? other.self_->clone() : nullptr) {}This gives us value semantics: copies are independent, assignment replaces content, no shared state surprises.
Beyond Regular: Algebraic Interfaces
Here’s where Stepanov’s insight applies. What if we type-erase algebraic structure?
any_ring: Erasing Ring Operations
class any_ring { struct concept_t { virtual std::unique_ptr<concept_t> clone() const = 0; virtual std::unique_ptr<concept_t> add(concept_t const&) const = 0; virtual std::unique_ptr<concept_t> multiply(concept_t const&) const = 0; virtual std::unique_ptr<concept_t> negate() const = 0; virtual std::unique_ptr<concept_t> zero() const = 0; virtual std::unique_ptr<concept_t> one() const = 0; virtual bool equals(concept_t const&) const = 0; }; // ... };Now
any_ringcan hold integers, polynomials, matrices, or any ring—and you can do arithmetic on it at runtime:any_ring a = 42; any_ring b = polynomial{1, 2, 3}; // Different types! // a + b would throw (type mismatch), but: any_ring c = a + any_ring(10); // Works: 52any_vector_space: Linear Algebra Abstraction
A vector space over a field F has:
- Vector addition: v + w
- Scalar multiplication: alpha * v
- Zero vector: 0
template<typename Scalar> class any_vector { struct concept_t { virtual std::unique_ptr<concept_t> clone() const = 0; virtual std::unique_ptr<concept_t> add(concept_t const&) const = 0; virtual std::unique_ptr<concept_t> scale(Scalar) const = 0; virtual std::unique_ptr<concept_t> zero() const = 0; virtual Scalar inner_product(concept_t const&) const = 0; }; // ... };This lets you write algorithms that work on any vector space:
// Gram-Schmidt orthogonalization - works on any inner product space template<typename Scalar> std::vector<any_vector<Scalar>> gram_schmidt( std::vector<any_vector<Scalar>> const& basis ) { std::vector<any_vector<Scalar>> orthonormal; for (auto const& v : basis) { auto u = v; for (auto const& e : orthonormal) { // u = u - <u,e>*e u = u - e.scale(u.inner_product(e)); } // Normalize u Scalar norm = std::sqrt(u.inner_product(u)); orthonormal.push_back(u.scale(1.0 / norm)); } return orthonormal; }The Stepanov Connection
This is Stepanov’s approach at the type level:
- Identify the algebraic structure (ring, vector space, etc.)
- Define operations axiomatically (not by inheritance)
- Let any type that provides the operations participate
The concept/model pattern makes this work at runtime. It’s the same philosophy as C++20 concepts, but with runtime dispatch instead of compile-time.
-
Read More →
Numerical integration (quadrature) for C++20.
Overview
The definite integral represents the signed area under a curve:
integral from a to b of f(x) dxSince most functions lack closed-form antiderivatives, we approximate integrals numerically using quadrature rules—weighted sums of function evaluations:
integral from a to b of f(x) dx ~ sum of w_i * f(x_i)Different rules choose different nodes x_i and weights w_i, trading off accuracy and computational cost.
Quick Start
#include <integration/integrate.hpp> #include <cmath> #include <iostream> int main() { using namespace integration; // integral from 0 to pi of sin(x) dx = 2 double result = integrate([](double x) { return std::sin(x); }, 0.0, 3.14159265); std::cout << "integral of sin(x) dx = " << result << "\n"; // ~2.0 }The Quadrature Zoo
Basic Rules
Rule Formula Error Exact for Midpoint (b-a)*f(m) O(h^3) Linear Trapezoidal (b-a)/2*(f(a)+f(b)) O(h^3) Linear Simpson’s (b-a)/6*(f(a)+4f(m)+f(b)) O(h^5) Cubic double m = midpoint_rule(f, a, b); double t = trapezoidal_rule(f, a, b); double s = simpsons_rule(f, a, b);Composite Rules
Divide [a,b] into n subintervals and apply the basic rule to each:
// Error: O(h^2) where h = (b-a)/n double m = composite_midpoint(f, a, b, 100); double t = composite_trapezoidal(f, a, b, 100); // Error: O(h^4) - much more accurate! double s = composite_simpsons(f, a, b, 100); // n must be evenGauss-Legendre Quadrature
The optimal choice: n points exactly integrate polynomials of degree 2n-1.
// 5-point Gauss-Legendre: exact for degree <= 9 double g = gauss_legendre<5>(f, a, b);Adaptive Integration
Automatically refines where the function is “difficult”:
// Recommended for general use double result = integrate(f, a, b); // Default tolerance 1e-10 double result = integrate(f, a, b, 1e-12); // Custom tolerance // With error estimate auto [value, error, evals] = integrate_with_error(f, a, b);Deriving Simpson’s Rule
Taylor expand f(x) around the midpoint m = (a+b)/2. Odd powers vanish by symmetry when integrating from a to b:
integral from a to b of f(x) dx ~ (b-a)f(m) + f''(m)(b-a)^3/24 + O(h^5)Simpson’s rule is the unique combination of endpoint and midpoint values that cancels the h^2 error:
integral from a to b of f(x) dx = (b-a)/6 * [f(a) + 4f(m) + f(b)] + O(h^5)Remarkably, this also cancels the h^3 term—Simpson gets a “bonus degree.”
Why Gauss-Legendre is Optimal
For n evaluation points, we have 2n free parameters (n nodes + n weights). We can match 2n conditions: exact integration of 1, x, x^2, …, x^{2n-1}.
The solution: nodes are roots of the n-th Legendre polynomial P_n(x). These orthogonal polynomials arise naturally from the optimization.
Generic Programming: The Stepanov Way
This module follows the Stepanov principle: algorithms arise from minimal operations.
-
Read More →
Demystifying automatic differentiation: building a reverse-mode AD system in C++20.
Automatic differentiation powers modern machine learning. PyTorch’s
autograd, JAX’sjit, TensorFlow’sGradientTape—all rely on the same fundamental insight: we can compute exact derivatives by mechanically applying the chain rule.The Functional API
The core interface is a single function:
grad. It takes a function and returns its gradient function:auto f = [](const auto& x) { return sum(pow(x, 2.0)); // f(x) = sum(x^2) }; auto df = grad(f); // df(x) = 2x matrix<double> x{{1.0}, {2.0}, {3.0}}; auto gradient = df(x); // Returns {{2.0}, {4.0}, {6.0}}Because
gradreturns a callable, you can compose it:auto d2f = grad(grad(f)); // Second derivativeNo classes to instantiate, no global state to manage. Just functions transforming functions.
How Reverse-Mode AD Works
The
gradfunction reveals the algorithm:template <typename F> auto grad(F&& f) { return [f = std::forward<F>(f)]<typename T>(const T& input) { graph g; // 1. Fresh graph auto x = g.make_var(input); // 2. Create input variable auto y = f(x); // 3. Forward pass (builds graph) g.backward<output_type>(y); // 4. Backward pass return g.gradient(x); // 5. Extract gradient }; }Forward pass: When
f(x)executes, each operation creates a node in the graph. The node stores:- The computed value
- References to parent nodes
- A backward function implementing that operation’s Jacobian
Backward pass: Starting from the output, we traverse the graph in reverse topological order. Each node’s backward function receives the upstream gradient and propagates it to parents via the chain rule.
Implementing Jacobians
Each differentiable operation provides a backward function. Let’s examine three matrix calculus results.
Matrix Multiplication: C = AB
If the loss is L, and C appears in L, we need dL/dA and dL/dB given dL/dC.
The chain rule gives us:
- dL/dA = (dL/dC) * B^T
- dL/dB = A^T * (dL/dC)
Determinant: d = det(A)
This is a classic matrix calculus result:
d(det A)/dA = det(A) * A^{-T}
where A^{-T} denotes the transpose of the inverse.
Matrix Inverse: B = A^{-1}
Starting from A * A^{-1} = I and differentiating implicitly:
dA * A^{-1} + A * dA^{-1} = 0 dA^{-1} = -A^{-1} * dA * A^{-1}
Applying the chain rule to a scalar loss L through B = A^{-1}:
dL/dA = -A^{-T} * (dL/dB) * A^{-T}
Gradient Accumulation
When a variable appears multiple times in a computation, gradients accumulate:
void accumulate_gradient(node_id id, const T& grad) { auto& n = nodes_.at(id.index); if (!n.gradient.has_value()) { n.gradient = grad; } else { auto& existing = std::any_cast<T&>(n.gradient); existing = existing + grad; } }For example, if
y = x + x, then dy/dx = 2, not 1. Each use ofxcontributes its gradient. -
Read More →
Numerical differentiation via finite differences for C++20.
Overview
The derivative is defined as a limit:
f'(x) = lim_{h->0} (f(x+h) - f(x)) / hWe cannot take h = 0 on a computer, but we can choose small h. The art of numerical differentiation lies in choosing h wisely—small enough that the approximation is good, but not so small that floating-point errors dominate.
Quick Start
#include <finite-diff/finite_diff.hpp> #include <cmath> #include <iostream> int main() { using namespace finite_diff; // Derivative of sin(x) at x = 1 auto f = [](double x) { return std::sin(x); }; double x = 1.0; double df = central_difference(f, x); // Uses optimal h std::cout << "f'(1) = " << df << "\n"; // ~0.5403 std::cout << "cos(1) = " << std::cos(1.0) << "\n"; // 0.5403... }The Two Errors
Every finite difference approximation has two sources of error:
- Truncation error: From the Taylor series approximation. Decreases as h -> 0.
- Round-off error: From floating-point arithmetic. Increases as h -> 0.
Total error = Truncation + Round-off = O(h^p) + O(eps/h)where eps is approximately 2.2 x 10^-16 (machine epsilon for
double).The optimal h minimizes total error:
Method Truncation Optimal h Typical accuracy Forward O(h) sqrt(eps) ~8 digits Central O(h^2) eps^(1/3) ~10 digits Five-point O(h^4) eps^(1/5) ~12 digits API Reference
First Derivatives
// Forward difference: (f(x+h) - f(x)) / h // O(h) accuracy double df = forward_difference(f, x); // Backward difference: (f(x) - f(x-h)) / h // O(h) accuracy double df = backward_difference(f, x); // Central difference: (f(x+h) - f(x-h)) / (2h) // O(h^2) accuracy - the workhorse double df = central_difference(f, x); // Five-point stencil: higher accuracy formula // O(h^4) accuracy double df = five_point_stencil(f, x);Second Derivatives
// Second derivative: (f(x+h) - 2f(x) + f(x-h)) / h^2 double d2f = second_derivative(f, x); // Five-point second derivative: O(h^4) double d2f = second_derivative_five_point(f, x);Multivariate Functions
// Gradient: grad(f)(x) for f: R^n -> R auto grad = gradient(f, x); // Directional derivative: D_v f(x) double Dvf = directional_derivative(f, x, v); // Hessian matrix: H[i][j] = d^2f/dx_i dx_j auto H = hessian(f, x); // Jacobian matrix for f: R^n -> R^m auto J = jacobian(f, x);Deriving the Formulas
Forward Difference
Taylor expand f(x+h):
f(x+h) = f(x) + h*f'(x) + (h^2/2)*f''(x) + O(h^3)Solve for f’(x):
f'(x) = (f(x+h) - f(x))/h - (h/2)*f''(x) + O(h^2) = (f(x+h) - f(x))/h + O(h)Central Difference
Taylor expand f(x+h) and f(x-h):
f(x+h) = f(x) + h*f'(x) + (h^2/2)*f''(x) + (h^3/6)*f'''(x) + O(h^4) f(x-h) = f(x) - h*f'(x) + (h^2/2)*f''(x) - (h^3/6)*f'''(x) + O(h^4)Subtract:
-
Read More →
Forward-mode automatic differentiation via dual numbers for C++20.
Overview
Dual numbers are a simple yet powerful technique for computing exact derivatives. The key insight: if we extend our number system with an element
epsilonwhereepsilon^2 = 0, then evaluatingf(x + epsilon)yieldsf(x) + epsilon * f'(x). The derivative emerges automatically from the algebra.Quick Start
#include <dual/dual.hpp> #include <iostream> int main() { using namespace dual; // Create a dual variable at x = 2 auto x = dual<double>::variable(2.0); // Compute f(x) = x^3 - 3x + 1 auto f = x*x*x - 3.0*x + 1.0; std::cout << "f(2) = " << f.value() << "\n"; // 3.0 std::cout << "f'(2) = " << f.derivative() << "\n"; // 9.0 }The Mathematics
A dual number has the form
a + b*epsilonwhereepsilon^2 = 0. Arithmetic follows naturally:(a + b*eps) + (c + d*eps) = (a+c) + (b+d)*eps (a + b*eps) * (c + d*eps) = ac + (ad + bc)*eps + bd*eps^2 = ac + (ad + bc)*epsNotice how the
bd*eps^2term vanishes becauseeps^2 = 0.For a function f, Taylor expansion gives:
f(a + b*eps) = f(a) + b*f'(a)*eps + (b^2/2)*f''(a)*eps^2 + ... = f(a) + b*f'(a)*epsIf we set
b = 1(marking x as “the variable we’re differentiating with respect to”), then:f(x + eps) = f(x) + f'(x)*epsThe derivative appears as the coefficient of epsilon!
API Reference
dual
The core dual number type.
// Create a variable for differentiation auto x = dual<double>::variable(3.0); // x = 3, dx = 1 // Create a constant auto c = dual<double>::constant(2.0); // c = 2, dc = 0 // Access values double val = x.value(); // 3.0 double deriv = x.derivative(); // 1.0 // Arithmetic operators: +, -, *, / auto y = sin(x*x) + exp(-x); // Convenience function auto [value, deriv] = differentiate([](auto x) { return x*x; }, 3.0);Mathematical Functions
All standard math functions are supported with correct derivative propagation:
- Basic:
sqrt,cbrt,abs - Exponential:
exp,exp2,expm1,log,log2,log10,log1p - Trigonometric:
sin,cos,tan,asin,acos,atan,atan2 - Hyperbolic:
sinh,cosh,tanh,asinh,acosh,atanh - Power:
pow,hypot - Special:
erf,erfc
Higher-Order Derivatives
Second derivatives with dual2:
auto result = differentiate2([](auto x) { return sin(x); }, 1.0); // result.value = sin(1) // result.first = cos(1) // result.second = -sin(1)Arbitrary order with jets:
// Compute f, f', f'', f''', f'''' at x = 1 auto derivs = derivatives<4>([](auto x) { return exp(x); }, 1.0); // All derivatives of e^x at x=1 equal eForward vs Reverse Mode
This library implements forward mode AD:
- Basic:
-
Read More →
Why build yet another linear algebra library? The world already has Eigen, Armadillo, Blaze, and countless others. The answer lies not in performance or features, but in pedagogy.
elementa exists to teach three things simultaneously: linear algebra, modern C++, and numerical computing. Every design decision prioritizes clarity over cleverness. The code reads like a textbook that happens to compile.
The Matrix Concept
C++20 concepts let us express “what a matrix is” as a compile-time contract. Here’s the core definition:
template <typename M> concept Matrix = requires(M m, const M cm, std::size_t i, std::size_t j) { typename M::scalar_type; { cm.rows() } -> std::same_as<std::size_t>; { cm.cols() } -> std::same_as<std::size_t>; { m(i, j) } -> std::same_as<typename M::scalar_type&>; { cm(i, j) } -> std::same_as<const typename M::scalar_type&>; { cm + cm } -> std::same_as<M>; { cm - cm } -> std::same_as<M>; { -cm } -> std::same_as<M>; };This concept says: a type
Mis a Matrix if it has ascalar_type, dimension queries, element access (both mutable and const), and basic arithmetic. Notice what’s absent: scalar multiplication. This omission avoids circular constraint issues with theoperator*overload for matrix multiplication. Instead, we provide ascale()function for generic code.The beauty of concepts is that any type satisfying these constraints works with our algorithms—no inheritance required. We can write:
template <Matrix M> auto det(const M& A) -> typename M::scalar_type;and this function works for
matrix<double>,matrix<float>, or any future type satisfyingMatrix.Clean API Design
A pedagogical library needs an ergonomic API. elementa provides multiple ways to construct matrices:
// Default: empty 0x0 matrix<double> empty; // Filled with value matrix<double> zeros(3, 4, 0.0); // 3x4 of zeros // Flat initializer list (row-major) matrix<double> flat(2, 3, {1, 2, 3, 4, 5, 6}); // Nested initializer list (most natural) matrix<double> natural{{1, 2, 3}, {4, 5, 6}};Value semantics are enforced throughout. Operators like
+and-return new matrices, marked[[nodiscard]]to prevent accidental discards.LU Decomposition: The Heart of the Library
LU decomposition factors a matrix A into a lower triangular L and upper triangular U such that PA = LU, where P is a permutation matrix capturing row swaps. This single decomposition powers determinants, inverses, and linear system solving.
The implementation uses partial pivoting: at each step, we find the largest absolute value in the current column to use as the pivot. This prevents division by small numbers that would amplify rounding errors.
template <Arithmetic T> struct lu_result { matrix<T> L; // Lower triangular (unit diagonal) matrix<T> U; // Upper triangular std::vector<std::size_t> perm; // Permutation vector int sign; // Sign of permutation (+1 or -1) bool singular; // True if matrix is singular };Derived Operations
Once you have LU, everything else follows.
-
Polynomial arithmetic teaches a profound truth: the same GCD algorithm works for integers and polynomials.
Overview
Both integers and polynomials are examples of Euclidean domains—algebraic structures where division with remainder is always possible.
// For integers: gcd(48, 18) = 6 // For polynomials: gcd(x^3 - 1, x^2 - 1) = x - 1 // Same algorithm, different types! template<euclidean_domain E> E gcd(E a, E b) { while (b != E(0)) { a = std::exchange(b, a % b); } return a; }Quick Start
#include <polynomials/polynomial.hpp> #include <iostream> using namespace poly; int main() { // Create polynomial x^2 - 1 = (x-1)(x+1) auto p = polynomial<double>{-1, 0, 1}; // Create polynomial x^3 - 1 = (x-1)(x^2+x+1) auto q = polynomial<double>{-1, 0, 0, 1}; // GCD should be (x - 1) auto g = gcd(p, q); std::cout << "gcd(x^2-1, x^3-1) has degree " << g.degree() << "\n"; // 1 // Find roots of x^2 - 1 auto roots = find_roots(p, -10.0, 10.0); for (double r : roots) { std::cout << "Root: " << r << "\n"; // -1 and 1 } }API Reference
Creating Polynomials
// From dense coefficients (a[i] = coefficient of x^i) polynomial<double> p{1, -2, 1}; // 1 - 2x + x^2 // Monomial: coefficient * x^degree auto m = polynomial<double>::monomial(3.0, 4); // 3x^4 // The variable x auto x = polynomial<double>::x(); // x // Constant polynomial<double> c{5.0}; // 5Arithmetic
auto sum = p + q; auto diff = p - q; auto prod = p * q; auto [quot, rem] = divmod(p, q); // Division with remainder auto quot_only = p / q; auto rem_only = p % q;GCD and Related
auto g = gcd(p, q); // Greatest common divisor auto [g, s, t] = extended_gcd(p, q); // Bezout: g = p*s + q*t auto l = lcm(p, q); // Least common multiple bool d = divides(p, q); // Does p divide q?Evaluation and Calculus
double val = evaluate(p, x); // p(x) auto dp = derivative(p); // p'(x) auto integral = antiderivative(p); // integral of p auto roots = find_roots(p, -10, 10); // All real roots in interval auto crit = stationary_points(p, -10, 10); // Where p'(x) = 0The Euclidean Domain Insight
What makes polynomials special is that they form a Euclidean domain:
Property Integers Polynomials Norm abs(n) degree(p) Division a = b*q + r, abs(r) < abs(b) a = b*q + r, deg(r) < deg(b) GCD gcd(48, 18) = 6 gcd(x^2-1, x-1) = x-1 The same Euclidean algorithm works for both because they share this structure.
Sparse Representation
Polynomials are stored as sorted (degree, coefficient) pairs. This is efficient for sparse polynomials like
x^1000 + 1(only 2 terms stored, not 1001).Why This Matters
The Euclidean domain insight exemplifies Stepanov’s philosophy: algorithms arise from algebraic structure. When you recognize that polynomials and integers share the same abstract structure, you immediately know that:
- GCD works
- Extended GCD works (Bezout’s identity)
- Unique factorization holds
- The Chinese Remainder Theorem applies
One structure, many types, same algorithms.
-
Read More →
Fractions that never lose precision
The Problem with Floating-Point
double x = 0.1 + 0.2; std::cout << (x == 0.3); // Prints 0 (false!)Floating-point represents numbers as m x 2^e. The number 0.1 has no exact finite representation in binary, just as 1/3 has no exact finite representation in decimal.
Rational arithmetic solves this: 1/3 stays exactly 1/3.
The Implementation
A rational is a pair (numerator, denominator) kept in lowest terms:
template<std::integral T> class rat { T num_; // numerator (carries sign) T den_; // denominator (always positive) void reduce() { T g = std::gcd(abs(num_), den_); num_ /= g; den_ /= g; } };Key invariants:
- Denominator is always positive (sign lives in numerator)
- GCD(|num|, den) = 1 (always reduced)
- Zero is uniquely 0/1
Arithmetic
Addition requires a common denominator:
a/b + c/d = (ad + bc) / bdThen reduce. The implementation:
rat operator+(rat const& rhs) const { return rat(num_ * rhs.den_ + rhs.num_ * den_, den_ * rhs.den_); }The constructor automatically reduces.
Multiplication is simpler:
(a/b) x (c/d) = ac / bdDivision multiplies by the reciprocal:
(a/b) / (c/d) = ad / bcExact Comparison
No floating-point fuzziness. Two reduced rationals are equal iff their numerators and denominators match:
bool operator==(rat const& rhs) const { return num_ == rhs.num_ && den_ == rhs.den_; }For ordering, cross-multiply (valid when denominators are positive):
a/b < c/d iff a x d < c x bThe Mediant
The mediant of a/b and c/d is (a+c)/(b+d). Unlike the average, it’s not halfway between, but it has remarkable properties:
rat mediant(rat const& a, rat const& b) { return rat(a.numerator() + b.numerator(), a.denominator() + b.denominator()); }Properties:
- If a/b < c/d, then a/b < mediant < c/d
- The mediant is always in lowest terms if a/b and c/d are neighbors in the Stern-Brocot tree
- Mediants generate all positive rationals exactly once
Stern-Brocot Tree
Start with 0/1 and 1/0 (infinity). Repeatedly take mediants:
Level 0: 0/1 1/0 Level 1: 0/1 1/1 1/0 Level 2: 0/1 1/2 1/1 2/1 1/0 Level 3: 0/1 1/3 1/2 2/3 1/1 3/2 2/1 3/1 1/0Every positive rational appears exactly once. This connects to:
- Continued fractions (path from root encodes CF)
- Best rational approximations
- Farey sequences
GCD: The Unifying Thread
Reducing fractions uses GCD. The GCD algorithm is:
T gcd(T a, T b) { while (b != 0) { a = a % b; std::swap(a, b); } return a; }This is Euclid’s algorithm from ~300 BCE—the same algorithm that appears for any Euclidean domain.
Rational numbers form a field: every non-zero element has a multiplicative inverse (the reciprocal). The denominator requirement (non-zero, reduced) comes from the algebraic structure.
-
Read More →
The Miller-Rabin primality test and the mathematics of certainty
The Problem
Given a large number n, is it prime? Trial division up to sqrt(n) is too slow for cryptographic-sized numbers. We need something faster—and we’re willing to accept “probably prime” with quantifiable certainty.
Fermat’s Little Theorem
For prime p and any a not divisible by p:
a^(p-1) = 1 (mod p)This suggests a test: pick random a, compute a^(n-1) mod n. If the result isn’t 1, n is definitely composite. But if it is 1, n might be prime… or might be a Carmichael number that fools this test.
The Miller-Rabin Improvement
Miller and Rabin observed something stronger. For odd prime p, write p-1 = 2^r x d (factor out all 2s). Then the sequence:
a^d, a^(2d), a^(4d), ..., a^(2^r x d) = a^(p-1)must either:
- Start with 1, or
- Contain -1 (i.e., p-1) somewhere before reaching 1
Why? Because the only square roots of 1 mod p are plus or minus 1. If we ever see 1 without first seeing -1, we’ve found a non-trivial square root of 1, proving n is composite.
The Witness Test
bool witness_test(int64_t n, int64_t a) { // Write n-1 = 2^r x d int64_t d = n - 1; int r = 0; while ((d & 1) == 0) { d >>= 1; r++; } // Compute x = a^d mod n int64_t x = mod_pow(a, d, n); if (x == 1 || x == n - 1) return true; // Probably prime // Square r-1 times, looking for n-1 for (int i = 1; i < r; i++) { x = (x * x) % n; if (x == n - 1) return true; } return false; // Definitely composite }If
witness_test(n, a)returns false, n is definitely composite—a is a “witness” to compositeness.Error Bounds
The beautiful part: for any composite n, at least 3/4 of all possible witnesses a in [2, n-2] will detect it. This means each random witness has at most 1/4 chance of failing to detect a composite.
With k independent witnesses:
P(false positive) <= (1/4)^kWitnesses Error bound 10 < 10^-6 20 < 10^-12 40 < 10^-24 Parameterizing by Error
Rather than asking “how many iterations?”, ask “what error rate is acceptable?”:
bool is_prime(int64_t n, double max_error = 1e-12) { int k = ceil(-log(max_error) / log(4)); // ... test with k random witnesses }The implementation also returns the actual error bound achieved:
auto result = is_prime_with_error(n); if (result.probably_prime) { std::cout << "Prime with error < " << result.error_bound << "\n"; }Modular Exponentiation
The workhorse is computing a^e mod n efficiently. The peasant algorithm again: