Getting Started with algebraic.mle

This tutorial walks you through fitting models using maximum likelihood estimation and performing inference with the algebraic.mle package.

Installation

Install the development version from GitHub:

if (!require(devtools)) {
    install.packages("devtools")
}
devtools::install_github("queelius/algebraic.mle")

You may also want the companion packages:

devtools::install_github("queelius/algebraic.dist")
devtools::install_github("queelius/likelihood.model")

Tutorial 1: Fitting a Conditional Exponential Model

Let’s fit a conditional exponential model where the rate depends on a covariate.

The Model

We’ll model:

$$Y | x \sim \text{Exponential}(\text{rate}(x))$$

where the rate function is:

$$\text{rate}(x) = \exp(\beta_0 + \beta_1 \cdot x)$$

Step 1: Generate Sample Data

First, let’s create a data generating process (DGP):

library(algebraic.mle)
set.seed(1231)

# True parameters
b0 <- -0.1
b1 <- 0.5

# DGP: Exponential + small Gaussian noise
dgp <- function(n, x) {
    rate <- exp(b0 + b1 * x)
    X <- rexp(n, rate)
    W <- rnorm(n, 0, 1e-3)  # Measurement noise
    return(X + W)
}

# Generate observations
n <- 75
df <- data.frame(x = runif(n, -10, 10))
df$y <- sapply(df$x, function(xi) dgp(1, xi))

Step 2: Define the Log-Likelihood

# Response extractor
resp <- function(df) df$y

# Rate function (parameterized)
rate <- function(df, beta) exp(beta[1] + beta[2] * df$x)

# Log-likelihood function
loglik <- function(df, resp, rate) {
    function(beta) {
        sum(dexp(x = resp(df), rate = rate(df, beta), log = TRUE))
    }
}

Step 3: Fit the Model

# Initial parameter guess
par0 <- c(b0 = 0, b1 = 0)

# Fit using optim, wrap in mle object
fit <- mle_numerical(optim(
    par = par0,
    fn = loglik(df, resp, rate),
    control = list(fnscale = -1),  # Maximize
    hessian = TRUE
))

# View results
summary(fit)

Step 4: Inference

# Parameter estimates
params(fit)

# Confidence intervals
confint(fit)

# Standard errors
se(fit)

Step 5: Visualize the Fit

# Plot data
plot(df$x, df$y, xlab = "x", ylab = "y", main = "Conditional Exponential Fit")

# Add fitted conditional mean: E[Y|x] = 1/rate(x)
x_seq <- seq(-10, 10, 0.1)
b0_hat <- params(fit)[1]
b1_hat <- params(fit)[2]
y_hat <- 1 / exp(b0_hat + b1_hat * x_seq)

lines(x_seq, y_hat, col = "blue", lwd = 2)

# Add true conditional mean for comparison
y_true <- 1 / exp(b0 + b1 * x_seq)
lines(x_seq, y_true, col = "green", lwd = 2, lty = 2)

legend("topright", legend = c("Fitted", "True"),
       col = c("blue", "green"), lwd = 2, lty = c(1, 2))

Tutorial 2: Model Comparison with Likelihood Ratio Tests

A powerful feature of MLE is the ability to compare nested models.

Testing a Restricted Model

Let’s test the hypothesis that β₀ = 0 (no intercept):

# Null model: rate(x) = exp(b1 * x)
rate_null <- function(df, b1) exp(b1 * df$x)

fit_null <- mle_numerical(optim(
    par = 0,
    fn = loglik(df, resp, rate_null),
    control = list(fnscale = -1),
    hessian = TRUE,
    method = "BFGS"
))

# Likelihood ratio test
lrt_stat <- -2 * (loglik_val(fit_null) - loglik_val(fit))
p_value <- pchisq(lrt_stat, df = 1, lower.tail = FALSE)

cat("LRT statistic:", lrt_stat, "\n")
cat("p-value:", p_value, "\n")

Model Selection with AIC

cat("Full model AIC:", aic(fit), "\n")
cat("Null model AIC:", aic(fit_null), "\n")

# Lower AIC is better
if (aic(fit) < aic(fit_null)) {
    cat("Full model is preferred\n")
} else {
    cat("Null model is preferred\n")
}

Tutorial 3: Bootstrap Confidence Intervals

When the asymptotic normal approximation may not hold (small samples, boundary parameters), use bootstrap:

library(boot)

# Bootstrap fitting function
boot_fn <- function(data, indices) {
    d <- data[indices, ]
    fit <- mle_numerical(optim(
        par = c(0, 0),
        fn = loglik(d, resp, rate),
        control = list(fnscale = -1),
        hessian = TRUE
    ))
    params(fit)
}

# Run bootstrap
boot_results <- boot(df, boot_fn, R = 1000)

# Bootstrap confidence intervals
boot.ci(boot_results, type = c("perc", "bca"), index = 1)  # For b0
boot.ci(boot_results, type = c("perc", "bca"), index = 2)  # For b1

Next Steps

Explore the Examples for more use cases
Read the API Documentation for complete function reference
See likelihood.model for building complex likelihood specifications