Skip to main content

algebraic.dist: Distributions as Algebraic Objects in R

Most statistical software treats probability distributions as parameter sets you pass to sampling or density functions. algebraic.dist takes a different approach. Distributions are algebraic objects that compose, transform, and combine through standard mathematical operations.

The Idea

Instead of this:

x <- rnorm(1000, mean=5, sd=2)
y <- rnorm(1000, mean=3, sd=1)
z <- x + y  # Just numeric vectors

You write:

X <- Normal(mean=5, sd=2)
Y <- Normal(mean=3, sd=1)
Z <- X + Y  # A new distribution object!
sample(Z, 1000)

The sum Z knows it is Normal(mean=8, sd=sqrt(5)) because the algebra works it out. You never lost the distributional structure.

Why It Matters

When you add two normal distributions numerically, you get samples from the sum. But you lose the distribution. With algebraic.dist, the result is still a distribution object with proper parameters and you can keep composing.

You can build complex distributional expressions and simplify them algebraically before ever drawing a sample:

portfolio <- 0.6*StockA + 0.4*StockB
risk <- sd(portfolio)  # Computed symbolically

For distributions with known closed-form algebra (normal, exponential, certain mixtures), you do not need simulation. You just compute the exact answer. Monte Carlo without the Monte Carlo.

Composition

This is functional programming applied to probability theory. Distributions become composable building blocks:

  • Mixture models: 0.3*Normal(0,1) + 0.7*Normal(5,2)
  • Transformed distributions: exp(Normal(0,1)) is lognormal
  • Conditional distributions: X | (X > 0) for truncation

The idea is that computation should mirror mathematical structure. If the math says you can add two normals and get a normal, the code should do the same thing and give you a normal back, not a vector of samples.

This connects to a broader theme in my work. Just as my oblivious computing research uses type theory to enforce privacy invariants, algebraic.dist uses algebraic types to enforce distributional invariants. The algebra tells you what operations are valid and what the results mean.

Implementation

  • Language: R
  • Type system: S3 classes with method dispatch for operations
  • Closed-form operations: Normal, exponential, gamma families
  • Fallback: Monte Carlo for compositions without closed forms
  • Repository: github.com/queelius/algebraic.dist
  • algebraic.mle: Maximum likelihood estimation with algebraic specification
  • numerical.mle: Numerical optimization for MLE when closed forms do not exist
  • likelihood.model: Likelihood-based inference with compositional model building

Most statistical software is imperative. You tell it what to do step by step. algebraic.dist is declarative. You describe the distributional relationships and the computer figures out what to compute. Small composable pieces that do one thing well: preserve distributional structure through transformations.

Discussion