Back to algebraic.mle

Tutorials

Step-by-step guides for using algebraic.mle.

Getting Started with algebraic.mle

This tutorial walks you through fitting models using maximum likelihood estimation and performing inference with the algebraic.mle package.

Installation

Install the development version from GitHub:

if (!require(devtools)) {
    install.packages("devtools")
}
devtools::install_github("queelius/algebraic.mle")

You may also want the companion packages:

devtools::install_github("queelius/algebraic.dist")
devtools::install_github("queelius/likelihood.model")

Tutorial 1: Fitting a Conditional Exponential Model

Let’s fit a conditional exponential model where the rate depends on a covariate.

The Model

We’ll model:

YxExponential(rate(x))Y | x \sim \text{Exponential}(\text{rate}(x))

where the rate function is:

rate(x)=exp(β0+β1x)\text{rate}(x) = \exp(\beta_0 + \beta_1 \cdot x)

Step 1: Generate Sample Data

First, let’s create a data generating process (DGP):

library(algebraic.mle)
set.seed(1231)

# True parameters
b0 <- -0.1
b1 <- 0.5

# DGP: Exponential + small Gaussian noise
dgp <- function(n, x) {
    rate <- exp(b0 + b1 * x)
    X <- rexp(n, rate)
    W <- rnorm(n, 0, 1e-3)  # Measurement noise
    return(X + W)
}

# Generate observations
n <- 75
df <- data.frame(x = runif(n, -10, 10))
df$y <- sapply(df$x, function(xi) dgp(1, xi))

Step 2: Define the Log-Likelihood

# Response extractor
resp <- function(df) df$y

# Rate function (parameterized)
rate <- function(df, beta) exp(beta[1] + beta[2] * df$x)

# Log-likelihood function
loglik <- function(df, resp, rate) {
    function(beta) {
        sum(dexp(x = resp(df), rate = rate(df, beta), log = TRUE))
    }
}

Step 3: Fit the Model

# Initial parameter guess
par0 <- c(b0 = 0, b1 = 0)

# Fit using optim, wrap in mle object
fit <- mle_numerical(optim(
    par = par0,
    fn = loglik(df, resp, rate),
    control = list(fnscale = -1),  # Maximize
    hessian = TRUE
))

# View results
summary(fit)

Step 4: Inference

# Parameter estimates
params(fit)

# Confidence intervals
confint(fit)

# Standard errors
se(fit)

Step 5: Visualize the Fit

# Plot data
plot(df$x, df$y, xlab = "x", ylab = "y", main = "Conditional Exponential Fit")

# Add fitted conditional mean: E[Y|x] = 1/rate(x)
x_seq <- seq(-10, 10, 0.1)
b0_hat <- params(fit)[1]
b1_hat <- params(fit)[2]
y_hat <- 1 / exp(b0_hat + b1_hat * x_seq)

lines(x_seq, y_hat, col = "blue", lwd = 2)

# Add true conditional mean for comparison
y_true <- 1 / exp(b0 + b1 * x_seq)
lines(x_seq, y_true, col = "green", lwd = 2, lty = 2)

legend("topright", legend = c("Fitted", "True"),
       col = c("blue", "green"), lwd = 2, lty = c(1, 2))

Tutorial 2: Model Comparison with Likelihood Ratio Tests

A powerful feature of MLE is the ability to compare nested models.

Testing a Restricted Model

Let’s test the hypothesis that β₀ = 0 (no intercept):

# Null model: rate(x) = exp(b1 * x)
rate_null <- function(df, b1) exp(b1 * df$x)

fit_null <- mle_numerical(optim(
    par = 0,
    fn = loglik(df, resp, rate_null),
    control = list(fnscale = -1),
    hessian = TRUE,
    method = "BFGS"
))

# Likelihood ratio test
lrt_stat <- -2 * (loglik_val(fit_null) - loglik_val(fit))
p_value <- pchisq(lrt_stat, df = 1, lower.tail = FALSE)

cat("LRT statistic:", lrt_stat, "\n")
cat("p-value:", p_value, "\n")

Model Selection with AIC

cat("Full model AIC:", aic(fit), "\n")
cat("Null model AIC:", aic(fit_null), "\n")

# Lower AIC is better
if (aic(fit) < aic(fit_null)) {
    cat("Full model is preferred\n")
} else {
    cat("Null model is preferred\n")
}

Tutorial 3: Bootstrap Confidence Intervals

When the asymptotic normal approximation may not hold (small samples, boundary parameters), use bootstrap:

library(boot)

# Bootstrap fitting function
boot_fn <- function(data, indices) {
    d <- data[indices, ]
    fit <- mle_numerical(optim(
        par = c(0, 0),
        fn = loglik(d, resp, rate),
        control = list(fnscale = -1),
        hessian = TRUE
    ))
    params(fit)
}

# Run bootstrap
boot_results <- boot(df, boot_fn, R = 1000)

# Bootstrap confidence intervals
boot.ci(boot_results, type = c("perc", "bca"), index = 1)  # For b0
boot.ci(boot_results, type = c("perc", "bca"), index = 2)  # For b1

Next Steps