femtograd
Automatic Differentiation for Statistical Computing
Browse posts by tag
Automatic Differentiation for Statistical Computing
Exact Derivatives via Automatic Differentiation
Many structures come in pairs: forward/reverse AD, push/pull iteration, encode/decode. Recognizing duality lets you transfer theorems and insights between domains.
Three approaches to computing derivatives, forward-mode AD, reverse-mode AD, and finite differences, each with different trade-offs for numerical computing and machine learning.
Reverse-mode automatic differentiation is just the chain rule applied systematically. I built one in C++20 to understand what PyTorch and JAX are actually doing.
Dual numbers extend our number system with an infinitesimal epsilon where epsilon^2 = 0. Evaluating f(x + epsilon) yields f(x) + epsilon * f'(x)—the derivative emerges automatically from the algebra.