SIUe - Computational Statistics (STAT 575) - Problem Set 2

Last updated on Dec 8, 2023 statistics, R, EM algorithm

Problem 1
Problem 2

Problem 1

Derive the E-M algorithm for right-censored normal data with known variance, say $2 = 1$ . Consider $Y_{i}$ ’s that are i.i.d. from a $N (θ, 1)$ , $i = 1, 2 \dots, n$ . We observe $(x_{1}, \dots, x_{n})$ and $(δ_{1}, \dots, δ_{n})$ , where $x_{i} = min (y_{i}, c)$ , and $δ_{i} = I (y_{i} < c)$ . Let $C$ be the total number of censored (incomplete) observations. We denote the missing data as ${Z_{i} : δ_{i} = 0}$ .

Part (a)

Derive the complete log-likelihood, $ℓ (θ | Y)$ .

The unobserved random variates ${Y_{i}}$ are i.i.d. normally distributed, $Y_{i} \sim f_{Y_{i}} (y | θ)$ where $f_{Y_{i}} (y | θ) = (2 π)^{- \frac{1}{2}} \exp (- \frac{1}{2} (y - θ)^{2}) .$

The likelihood function is therefore $\begin{aligned} L (θ | {y_{i}}) & = \prod_{i = 1}^{n} (2 π)^{- \frac{1}{2}} \exp (- \frac{1}{2} (y_{i} - θ)^{2}) \\ = (2 π)^{- \frac{n}{2}} \exp (- \sum_{i = 1}^{n} \frac{1}{2} (y_{i} - θ)^{2}) . \end{aligned}$

Taking the logarithm of $L$ , $\begin{aligned} ℓ (θ | {y_{i}}) & = \log L (θ | {y_{i}}) \\ = - \frac{n}{2} \log (2 π) - \frac{1}{2} \sum_{i = 1}^{n} (y_{i} - θ)^{2} \\ = - \frac{n}{2} \log (2 π) - \frac{1}{2} \sum_{i = 1}^{n} y_{i}^{2} + θ \sum_{i = 1}^{n} y_{i} - \frac{n}{2} θ^{2} . \end{aligned}$

Anticipating that we will be maximizing the complete log-likelihood with respect to $θ$ , we put any terms that are not a function of $θ$ into $k$ , obtaining the result $ℓ (θ | {y_{i}}) = k + θ \sum_{i = 1}^{n} y_{i} - \frac{n}{2} θ^{2} .$

Part (b)

Show the conditional expectation $E (Y | x, δ = 1, θ^{(t)}) = x$ and $E (Y | x, δ = 0, θ^{(t)}) = E (Y | Y > x) = θ^{(t)} + \frac{ϕ (x - μ)}{1 - Φ (x - μ)}$ where $ϕ$ and $Φ$ are pdf and cdf of standard normal.

The distribution of $Y$ given $δ = 1$ , is uncensored and therefore it is given that $Y$ realized the value $x$ . Since the expectation of a constant $x$ is $x$ , that means $E (Y | Y = x) = x$ .

If $δ = 0$ , $Y$ is censored, i.e., $Y > x$ . To take its expectation, we first need to derive the conditional conditional distribution of $Y$ given $Y > x$ and $θ^{(t)}$ .

The probability $Pr (Y \leq y | Y > x)$ is given by $Pr (Y \leq y | Y > x) = Pr (x < Y \leq y) / Pr (Y > x)$ which may be rewritten as $Pr (Y \leq y | Y > x) = \frac{F_{Y} (y | θ^{(t)}) - F_{Y} (x | θ^{(t)})}{1 - F_{Y} (y | θ^{(t)})} .$ where $F_{Y | θ^{(t)}}$ is the cdf of the normal distribution with $σ = 1$ and $μ = θ^{(t)}$ .

We may rewrite $F_{Y | θ^{(t)}}$ in terms of the standard normal, $F_{Y} (y | θ^{(t)}) = Φ (y - θ^{(t)}),$ and thus we may rewrite the conditional distribution of $Y | Y > x$ as $Pr (Y \leq y | Y > x) = \frac{Φ (y - θ^{(t)}) - Φ (x - θ^{(t)})}{1 - Φ (x - θ^{(t)})}$ and thus after further simplifying, we obtain the cdf of $Y | x$ , $F_{Y | x} (y | θ^{(t)}) = 1 - \frac{1 - Φ (y - θ^{(t)})}{1 - Φ (x - θ^{(t)})}$ which has a density given by $f_{Y} (y | x, θ^{(t)}) = \frac{ϕ (y - θ^{(t)})}{1 - Φ (x - θ^{(t)})} I (y > x) .$

The expectation of $Y | (x, θ^{(t)})$ is given by $\begin{aligned} E (Y | x, θ^{(t)}) & = \int_{x}^{\infty} y f_{Y} (y | x, θ^{(t)}) d y \\ = \int_{x}^{\infty} y (\frac{ϕ (y - θ^{(t)})}{1 - Φ (x - θ^{(t)})}) d y \\ = \frac{1}{1 - Φ (x - θ^{(t)})} \int_{x}^{\infty} y ϕ (y - θ^{(t)}) d y . \end{aligned}$

Analytically, this is a tricky integration problem. Certainly, it would be trivial to numerically integrate this to obtain a solution, but we seek a closed-form solution.

I searched online, and discovered an interesting way to tackle this integration problem.

Let $f$ and $F$ respectively denote the pdf and cdf of the normally distributed $Y$ . Then, $\frac{d f}{d y} = - (y - θ) f (y)$ and $\int_{a}^{b} \frac{d f}{d y} d y = f (b) - f (a) .$

Then, $\begin{aligned} E (Y | x, θ^{(t)}) & = \frac{1}{1 - F (x)} \int_{x}^{\infty} y f (y) d y \\ = - \frac{1}{1 - F (x)} \int_{x}^{\infty} - (y - θ^{(t)}) f (y) d y + \frac{θ^{(t)}}{1 - F (x)} \int_{x}^{\infty} f (y) d y \\ = - \frac{1}{1 - F (x)} \int_{x}^{\infty} \frac{d f}{d y} d y + \frac{θ^{(t)}}{1 - F (x)} (1 - F (x)) \\ = - \frac{1}{1 - F (x)} (f (\infty) - f (x)) + θ^{(t)} \\ = \frac{f (x)}{1 - F (x)} + θ^{(t)} . \end{aligned}$

We may rewrite the last line as $E (Y | x, θ^{(t)}) = θ^{(t)} + \frac{ϕ (x - θ^{(t)})}{1 - Φ (x - θ^{(t)})} .$

Part (c)

Derive the $E$ -step and $M$ -step using parts (a) and (b). Give the updating equation.

E-step

The $E$ -step entails taking the conditional expectation of the complete log-likelihood function $ℓ (θ | {Y_{i}})$ given the observed data ${x_{i}}$ and ${δ_{i}}$ .

$\begin{aligned} Q (θ | θ^{(t)}) & = E_{Y_{i} | x_{i}, δ_{i}} (ℓ (θ | {Y_{i}}) \\ = E_{Y_{i} | x_{i}, δ_{i}} (k + θ \sum_{i = 1}^{n} Y_{i} - \frac{n}{2} θ^{2}) \\ = k - \frac{n}{2} θ^{2} + θ \sum_{i = 1}^{n} E_{Y_{i} | x_{i}, δ_{i}} (Y_{i}) . \end{aligned}$

We have already solved the expectation of $Y_{i}$ given $x_{i}$ and $δ_{i}$ . We rewrite $Q$ by substituting $E (Y_{i} | x_{i}, δ_{i})$ with its previously found solution, $Q (θ | θ^{(t)}) = k - \frac{n}{2} θ^{2} + θ \sum_{i = 1}^{n} δ_{i} x_{i} + (1 - δ_{i}) (θ^{(t)} + \frac{ϕ (x_{i} - θ^{(t)})}{1 - Φ (x_{i} - θ^{(t)})}) .$

Letting $C = \sum_{i = 1}^{n} (1 - δ_{i})$ , $R = \sum_{i = 1}^{n} δ_{i} x_{i}$ , and separating out all terms that are independent of $θ^{(t)}$ , $Q (θ | θ^{(t)}) = k - \frac{n}{2} θ^{2} + C θ θ^{(t)} + R θ + θ \sum_{i = 1}^{n} \frac{(1 - δ_{i}) ϕ (x_{i} - θ^{(t)})}{1 - Φ (x_{i} - θ^{(t)})} .$

M-step

We wish to solve $θ^{(t + 1)} = {\arg \max}_{θ} Q (θ | θ^{(t)}) .$ by solving ${\frac{d Q (θ | θ^{(t)})}{d θ} |}_{θ = θ^{(t + 1)}} = 0,$ which may be written as $- n θ^{(t + 1)} + C θ^{(t)} + R + \sum_{i = 1}^{n} \frac{(1 - δ_{i}) ϕ (x_{i} - θ^{(t)})}{1 - Φ (x_{i} - θ^{(t)})} = 0.$

Solving for $θ^{(t + 1)}$ obtains the updating equation $θ^{(t + 1)} = \frac{R}{n} + \frac{C}{n} θ^{(t)} + \frac{1}{n} \sum_{i = 1}^{n} \frac{(1 - δ_{i}) ϕ (x_{i} - θ^{(t)})}{1 - Φ (x_{i} - θ^{(t)})} .$ where $R = \sum_{i = 1}^{n} δ_{i} x_{i}$ and $C = \sum_{i = 1}^{n} (1 - δ_{i}) .$

Part (d)

Use your algorithm on the V.A. data to find the MLE of $μ$ . Take the log of the event times first and standardize by sample standard deviation. You may simply use the censored data sample mean as your starting value.

In the following R code, we implement the updating equation derived in the previous step. We encapulsate the procedure into a function that takes its arguments in the form of a censored set, uncensorted set, starting value ( $θ^{(1)}$ ), and an $ϵ$ value to control stopping condition.

# assuming the uncensored and censored data are distributed normally,
# we use the EM algorithm to derive an estimator given censored and uncensored
# data.
mean_normal_censored_estimator_em <- function(uncensored,censored,theta,eps=1e-6,debug=T)
{
  dev <- sd(log(c(uncensored,censored)))
  censored <- log(censored) / dev
  uncensored <- log(uncensored) / dev
  theta <- log(theta) / dev
  
  n <- length(censored) + length(uncensored)
  C <- length(censored)
  R <- sum(uncensored)
  
  s <- function(theta)
  {
    sum <- 0
    for (i in 1:C)
    {
      num <- dnorm(censored[i],mean=theta,sd=1)
      denom <- 1-pnorm(censored[i],mean=theta,sd=1)
      sum <- sum + (num / denom)
    }
    sum
  }
  
  i <- 1
  repeat
  {
    theta.new <- R/n + C/n * theta + (1/n)*s(theta)
    if (debug==T) { cat("theta[", i, "] =",theta,", theta[", i+1, "] =",theta.new,"\n") }
    if (abs(theta.new - theta) < eps)
    {
      theta <- theta.new * dev
      theta <- exp(theta)
      return(theta)
    }
    i <- i + 1
    theta <- theta.new
  }
}

We apply this procedure to the indicated data set.

library(MASS) # has VA data
VAs <- subset(VA,prior==0)
censored <- VAs$status == 0
censored_xs <- VAs[censored,c("stime")]
uncensored_xs <- VAs[!censored,c("stime")]

mu <- mean(uncensored_xs)
cat("mean of the uncensored sample is ", mu, ".")

## mean of the uncensored sample is  112.1648 .

sol <- mean_normal_censored_estimator_em(uncensored_xs,censored_xs,mu)

## theta[ 1 ] = 3.857928 , theta[ 2 ] = 3.424258 
## theta[ 2 ] = 3.424258 , theta[ 3 ] = 3.415443 
## theta[ 3 ] = 3.415443 , theta[ 4 ] = 3.415286 
## theta[ 4 ] = 3.415286 , theta[ 5 ] = 3.415283 
## theta[ 5 ] = 3.415283 , theta[ 6 ] = 3.415283

sol

## [1] 65.2625

We see that our estimate of $θ$ is $\hat{θ} = 65.2624985$ . (The $θ$ before transforming it to the appropriate scale was $3.415283$ .)

This mean is somewhat lower than anticipated, which makes me suspect something is wrong with my updating equation. If I have the time, I will revisit it.

Problem 2

Problem 4.2

Part (a)

There are $N = 1500$ gay men in the survey sample where $X_{i}$ denotes the $i$ -th persons response to the number of risky sexual encounters he had in the previous $30$ days. Thus, we observe a sample $\vec{X} = (X_{1}, X_{2}, \dots, X_{N})$ .

We assume there are $3$ groups in the population, denoted by $z = 1$ , $t = 2$ , and $p = 3$ . Group $1$ members report $0$ risky sexual encounters regardless of the truth where the probability of being a member of group $1$ is denoted by $α$ ,

Group $2$ members accurately report risky sexual encounters and represent typical behavior where the probability of being a member of group $2$ is denoted by $β$ . We assume this group’s number of sexual encounters follows a poisson with mean $μ$ .

Group $3$ members accurately report risky sexual encounters and represent high-risk behavior where the probability of being a member of group $3$ is $γ = 1 - α - β$ . We assume this group’s number of sexual encounters follows a poisson with mean $λ$ .

This represents a finite mixture model with a pdf $X_{i} \sim f (x | \vec{θ}) = α I (x = 0) + β POI (x | μ) + (1 - α - β) POI (x | λ)$ with a parameter vector $\vec{θ} = (α, β, μ, λ)^{'} .$

Let the uncertain group that the $i$ -th person belongs to be denoted by $Z_{i}$ . If we observe group membership data, $X_{i} | Z_{i} = z_{i}$ , then $\begin{aligned} X_{i} | Z_{i} & = 1 \sim I (x = 0), \\ X_{i} | Z_{i} & = 2 \sim POI (μ), \\ X_{i} | Z_{i} & = 3 \sim POI (λ), \end{aligned}$ where $Z_{i} \sim f_{Z_{i}} (z_{i} | \vec{θ}) = Pr (Z_{i} = z_{i}) = {\begin{cases} α & z_{i} = 1, \\ β & z_{i} = 2, \\ γ = 1 - α - β & z_{i} = 3, \end{cases}$ and thus $f_{X_{i}, Z_{i}} (x_{i}, z_{i} | \vec{θ}) = α I (z_{i} = 1) + β POI (μ) I (z_{i} = 2) + (1 - α - β) POI (λ) I (z_{i} = 3) .$

The likelihood function is thus given by $L (\vec{θ} | \vec{X}, \vec{Z}) = \prod_{i = 1}^{N} f_{X_{i}, Z_{i}} (x_{i}, z_{i} | \vec{θ}),$ which may be rewritten as $L (\vec{θ} | \vec{X}, \vec{Z}) = (\prod_{{i | z_{i} = 1}} α I (x_{i} = 0)) (\prod_{{i | z_{i} = 2}} β \frac{μ^{x_{i}} e^{- μ}}{x_{i}!}) (\prod_{{i | z_{i} = 3}} γ \frac{λ^{x_{i}} e^{- λ}}{x_{i}!}) .$

We wish to rewrite this so that the data is explicitly represented. First, we do the transformation $L (\vec{θ} | \vec{X}, \vec{Z}) = (\prod_{{i | z_{i} = 1, x_{i} = 0}} α) \prod_{k = 0}^{16} (\prod_{{i | z_{i} = 2, x_{i} = k}} β \frac{μ^{k} e^{- μ}}{k!}) \prod_{k = 0}^{16} (\prod_{{i | z_{i} = 3, x_{i} = k}} γ \frac{λ^{k} e^{- λ}}{k!}) .$

We let $n_{a, b}$ denote the (unobserved) cardinality of ${i | z_{i} = a, x_{i} = b}$ , thus $L (\vec{θ} | {n_{j, k}}) = α^{n_{1, 0}} \prod_{k = 0}^{16} β^{n_{2, k}} \frac{μ^{k n_{2, k}} e^{- μ n_{2, k}}}{(k!)^{n_{2, k}}} \prod_{k = 0}^{16} γ^{n_{3, k}} \frac{λ^{k n_{3, k}} e^{- λ n_{3, k}}}{(k!)^{n_{3, k}}}$ is the complete likelihood. The complete log-likelihood is thus $ℓ (\vec{θ} | {n_{j, k}}) = n_{1, 0} \log α + \sum_{k = 0}^{16} \log (β^{n_{2, k}} \frac{μ^{k n_{2, k}} e^{- μ n_{2, k}}}{(k!)^{n_{2, k}}}) + \sum_{k = 0}^{16} \log (γ^{n_{3, k}} \frac{λ^{k n_{3, k}} e^{- λ n_{3, k}}}{(k!)^{n_{3, k}}})$ which simplies to $\begin{aligned} ℓ (\vec{θ} | {n_{j, k}}) = n_{1, 0} \log α + \sum_{k = 0}^{16} & n_{2, k} (\log β + k \log μ - μ - \log k!) + \\ n_{3, k} (\log γ + k \log λ - λ - \log k!) . \end{aligned}$

Anticipating taking $\frac{d ℓ}{d \vec{θ}}$ to solve for the maximum of the log-likelihood, we remove any terms that are not a function of $\vec{θ}$ , resulting in the kernel $ℓ (\vec{θ} | {n_{j, k}}) = n_{1, 0} \log α + \sum_{k = 0}^{16} {n_{2, k} (\log β + k \log μ - μ) + n_{3, k} (\log γ + k \log λ - λ)} .$

E-step

The conditional expectation to solve in the EM algorithm is given by $Q (\vec{θ} | {\vec{θ}}^{(t)}) = E (ℓ (\vec{θ}))$ where ${n_{k, j}}$ are random and ${n_{j}}$ and ${\vec{θ}}^{(t)}$ are given. We rewrite this as $Q (\vec{θ} | {\vec{θ}}^{(t)}) = E (n_{1, 0} \log α + \sum_{k = 0}^{16} {n_{2, k} (\log β + k \log μ - μ) + n_{3, k} (\log γ + k \log λ - λ)}) .$

Using the linearity of expectations, we rewrite the above to $Q (\vec{θ} | {\vec{θ}}^{(t)}) = E (n_{1, 0}) \log α + \sum_{k = 0}^{16} {E (n_{2, k}) (\log β + k \log μ - μ) + E (n_{3, k}) (\log γ + k \log λ - λ)}$ given ${n_{j}}$ and $θ^{(t)}$ .

Consider $E (n_{2, k} | {n_{j}}, θ^{(t)})$ . To solve this expectation, we must first derive the distribution of $n_{2, k}$ .

Suppose $x_{j} = k$ , then probability that the $j$ -th person belongs to group $2$ is given by $Pr (Z_{j} = 2 | x_{j} = k) = Pr (Z_{j} = 2) Pr (x_{j} = k | Z_{j} = 2) / Pr (x_{j} = k) .$ We note that $Pr (x_{j} = k)$ is equivalent to $π_{k} (\vec{θ})$ , $Pr (Z_{j} = 2)$ is the definition of $β$ , and $Pr (x_{j} = k | Z_{j} = 2)$ is $f_{X_{j} | Z_{j}} (k | Z_{j} = 2) = POI (k | μ)$ .

Making the substitutions yields the result $t_{k} (\vec{θ}) = Pr (Z_{j} = 2 | x_{j} = k) = β POI (k | μ) / π_{k} (\vec{θ}) .$

Assuming ${X_{i}}$ are i.i.d., observe that $k \neq 0$ , the distribution of $n_{2, k}$ given $n_{k}$ , $θ^{(t)}$ is binomial distributed with a probability of success $t_{k} ({\vec{θ}}^{(t)})$ . Thus, $E (n_{2, k}) = n_{k} t_{k} ({\vec{θ}}^{(t)}) .$ The same logic holds for $n_{3, k}$ and $n_{1, 0}$ , and thus $E (n_{3, k}) = n_{k} p_{k} ({\vec{θ}}^{(t)})$ and $E (n_{1, 0}) = n_{0} z_{0} ({\vec{θ}}^{(t)}),$ which means $Q (\vec{θ} | {\vec{θ}}^{(t)}) = n_{0} z_{0} ({\vec{θ}}^{(t)}) \log α + \sum_{k = 0}^{16} {n_{k} t_{k} ({\vec{θ}}^{(t)}) (\log β + k \log μ - μ) + n_{k} p_{k} ({\vec{θ}}^{(t)}) (\log γ + k \log λ - λ)}$

M-step

We wish to solve ${\vec{θ}}^{(t + 1)} = {\arg \max}_{\vec{θ}} Q (\vec{θ} | {\vec{θ}}^{(t)}) .$ by solving ${\nabla Q (\vec{θ} | {\vec{θ}}^{(t)}) |}_{\vec{θ} = {\vec{θ}}^{(t + 1)}} = \vec{0} .$

We use the Lagrangian to impose the restriction $α + β + γ = 1$ , thus we seek to perform the constrained maximization of $Q_{l} (\vec{θ}, c | {\vec{θ}}^{(t)}) = Q (\vec{θ} | {\vec{θ}}^{(t)}) + c (1 - α - β - γ) .$

Thus, when we solve for $α$ , $\frac{\partial Q_{l}}{\partial α} = \frac{n_{0} z_{0} (θ^{(t)})}{α} - c = 0,$ we get the result $α^{(t + 1)} = \frac{1}{c} n_{0} z_{0} (θ^{(t)}) .$

Similar results hold for $β$ and $γ$ , obtaining $β^{(t + 1)} = \frac{1}{c} \sum_{k = 0}^{16} n_{k} t_{k} (θ^{(t)}) .$ and $γ^{(t + 1)} = \frac{1}{c} \sum_{k = 0}^{16} n_{k} p_{k} (θ^{(t)}) .$

This does not look too promising until we realize that $n_{0} z_{0} (θ^{(t)}) + \sum_{k = 0}^{16} n_{k} t_{k} (θ^{(t)}) + \sum_{k = 0}^{16} n_{k} p_{k} (θ^{(t)}) = N .$

Thus, $c (α^{(t)} + β^{(t)} + γ^{(t)}) = N$ , which means $c = N$ since $α^{(t)} + β^{(t)} + γ^{(t)} = 1$ . Making this substitution obtains the result $\begin{aligned} α^{(t + 1)} & = \frac{1}{N} n_{0} z_{0} (θ^{(t)}) \\ β^{(t + 1)} & = \frac{1}{N} \sum_{k = 0}^{16} n_{k} t_{k} (θ^{(t)}) \\ γ^{(t + 1)} & = \frac{1}{N} \sum_{k = 0}^{16} n_{k} p_{k} (θ^{(t)}) . \end{aligned}$

Solving an estimator for $μ$ at iteration $(t + 1)$ , $\begin{aligned} {\frac{\partial Q_{l}}{\partial μ} |}_{μ = μ^{(t + 1}} & = 0 \\ \sum_{k = 0}^{16} n_{k} t_{k} (θ^{(t)}) (k / μ^{(t + 1)} - 1) & = 0 \\ \frac{1}{μ^{(t + 1)}} \sum_{k = 0}^{16} n_{k} t_{k} (θ^{(t)}) k & = \sum_{k = 0}^{16} n_{k} t_{k} (θ^{(t)}) \\ μ^{(t + 1)} & = \frac{\sum_{k = 0}^{16} k n_{k} t_{k} (θ^{(t)})}{\sum_{k = 0}^{16} n_{k} t_{k} (θ^{(t)})} . \end{aligned}$

The same derivation essentially follows for $λ$ , and thus $λ^{(t + 1)} = \frac{\sum_{k = 0}^{16} k n_{k} p_{k} (θ^{(t)})}{\sum_{k = 0}^{16} n_{k} p_{k} (θ^{(t)})} .$

Part (b)

Estimate the parameters of the model, using the observed data.

# we observe n = (n0,n1,...,n16)
ns <- c(379,299,222,145,109,95,73,59,45,30,24,12,4,2,0,1,1)
N <- sum(ns)

# theta := (alpha, beta, mu, lambda)'
# note that there is an implicit parameter gamma s.t.
# alpha + beta + gamma = 1
# the initial value assumes each category z, t, or p
# is equally probable, and so we let
#     (alpha^(0),beta^(0)) = (1/3,1/3)
# and mu^(0) and lambda^(0) are just arbitrarily chosen to be 2 and 3,
# with the insight that group 3 is more risky than group 2.
theta <- c(1/3,1/3,2,3)

# theta := (alpha, beta, mu, lambda)
Pi <- function(i,theta)
{
  res <- 0
  if (i == 0)
    res <- theta[1]
  
  res <- res + theta[2] * theta[3]^i * exp(-theta[3])
  res <- res + (1 - theta[1] - theta[2]) * theta[4]^i * exp(-theta[4])
  res
}

z0 <- function(theta)
{
  theta[1] / Pi(0,theta)
}

t <- function(i,theta)
{
  theta[2] * theta[3]^i * exp(-theta[3]) / Pi(i,theta)
}

p <- function(i,theta)
{
  (1-theta[1] - theta[2]) * theta[4]^i * exp(-theta[4]) / Pi(i,theta)
}

# update algorithm, based on EM algorithm
update <- function(theta,ns)
{
  # note: n0 := ns[1] instead of ns[0] since R does not use zero-based indexes
  alpha <- ns[1] * z0(theta) / N
  beta <- 0
  mu_num <- 0
  mu_denom <- 0
  
  lam_num <- 0
  lam_denom <- 0

  for (i in 0:16)
  {
    ti <- t(i,theta)
    pi <- p(i,theta)
    
    beta <- beta + ns[i+1] * ti
    
    mu_num <- mu_num + i * ns[i+1] * ti
    mu_denom <- mu_denom + ns[i+1] * ti
    
    lam_num <- lam_num + i * ns[i+1] * pi
    lam_denom <- lam_denom + ns[i+1] * pi
  }
  
  beta <- beta / N
  mu <- mu_num / mu_denom
  lam <- lam_num / lam_denom
  
  c(alpha,beta,mu,lam)
}

em <- function(theta,ns,steps=10000,debug=T)
{
  for(i in 1:steps)
  {
    theta = update(theta,ns)
    if (debug==T)
    {
      if (i %% 1000 == 0) { cat("iteration =",i," theta = (",theta,")'\n") }
    }
  }
  theta
}

# solution theta = (alpha, beta, mu, lambda)
sol <- em(theta,ns,10000,T)

## iteration = 1000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 2000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 3000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 4000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 5000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 6000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 7000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 8000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 9000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 10000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'

We see that the solution is $0.1221661, 0.5625419, 1.4674746, 5.9388889$ .

Part (c)

Estimate the standard errors and pairwise correlations of your parameters, using any available method.

We have chosen to use the Bootstrap method.

# ns = (379,299,222,145,109,95,73,59,45,30,24,12,4,2,0,1,1)
# 379 responded 0 encounters
# 299 responded 1 encounters
# 222 responded 2 encounters
# ...
# 1 responded 16 encounters
#
# to resample, we resample from the data set that includes each
# persons response, as determined by ns.
data <- NULL
for (i in 1:length(ns))
{
  data <- append(data,rep((i-1),ns[i]))
}

make_into_counts <- function(data)
{
  ns <- NULL
  for (i in 0:16)
  {
    ni <- data[data == i]
    l <-length(ni)
    ns <- append(ns,l)
  }
  ns
}

m <- 1000 # bootstrap replicates
em_steps <- 100
theta.bs <- em(theta,ns,em_steps,F)
thetas <- rbind(theta.bs)
for (i in 2:m)
{
  indices <- sample(N,N,replace=T)
  resampled <- make_into_counts(data[indices])
  theta.bs <- em(theta,resampled,em_steps,F)
  thetas <- rbind(thetas,theta.bs)
  if (i %% 100 == 0)
  {
    cat("iteration ", i, "\n")
    print(cov(thetas))
  }
}

## iteration  100 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0003903744 -0.0001438786 0.0017263560 0.001571458
## [2,] -0.0001438786  0.0004087653 0.0004993799 0.001813153
## [3,]  0.0017263560  0.0004993799 0.0152443668 0.015694549
## [4,]  0.0015714585  0.0018131530 0.0156945490 0.040863019
## iteration  200 
##               [,1]          [,2]        [,3]        [,4]
## [1,]  0.0004023631 -0.0001703146 0.001697344 0.001484011
## [2,] -0.0001703146  0.0004447321 0.000419080 0.001903868
## [3,]  0.0016973442  0.0004190800 0.014398069 0.015896531
## [4,]  0.0014840108  0.0019038683 0.015896531 0.044389192
## iteration  300 
##               [,1]          [,2]         [,3]       [,4]
## [1,]  0.0003956313 -0.0001727811 0.0016888749 0.00159774
## [2,] -0.0001727811  0.0004628827 0.0003008066 0.00181497
## [3,]  0.0016888749  0.0003008066 0.0137001437 0.01583219
## [4,]  0.0015977404  0.0018149703 0.0158321853 0.04443171
## iteration  400 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0003966960 -0.0001719888 0.0016451275 0.001548858
## [2,] -0.0001719888  0.0004684167 0.0003622689 0.002003458
## [3,]  0.0016451275  0.0003622689 0.0137218437 0.016201510
## [4,]  0.0015488580  0.0020034577 0.0162015099 0.046175397
## iteration  500 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004135674 -0.0001873768 0.0016694824 0.001624097
## [2,] -0.0001873768  0.0004681143 0.0003021341 0.001786532
## [3,]  0.0016694824  0.0003021341 0.0136412731 0.015939754
## [4,]  0.0016240973  0.0017865318 0.0159397544 0.043573619
## iteration  600 
##               [,1]          [,2]        [,3]        [,4]
## [1,]  0.0004111393 -0.0001937508 0.001653108 0.001534535
## [2,] -0.0001937508  0.0004822347 0.000260584 0.001829480
## [3,]  0.0016531076  0.0002605840 0.013451057 0.015368992
## [4,]  0.0015345352  0.0018294799 0.015368992 0.042992734
## iteration  700 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004259534 -0.0001994680 0.0017338084 0.001631373
## [2,] -0.0001994680  0.0004943474 0.0002033533 0.001795465
## [3,]  0.0017338084  0.0002033533 0.0136916465 0.015681755
## [4,]  0.0016313734  0.0017954653 0.0156817555 0.043016583
## iteration  800 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004134958 -0.0001932351 0.0016624262 0.001607135
## [2,] -0.0001932351  0.0004925245 0.0002242567 0.001774913
## [3,]  0.0016624262  0.0002242567 0.0131371751 0.015149211
## [4,]  0.0016071351  0.0017749127 0.0151492114 0.042293737
## iteration  900 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004139774 -0.0002014407 0.0016479052 0.001583003
## [2,] -0.0002014407  0.0004968611 0.0001793993 0.001649937
## [3,]  0.0016479052  0.0001793993 0.0129764348 0.014710759
## [4,]  0.0015830026  0.0016499365 0.0147107588 0.041445055
## iteration  1000 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004196823 -0.0001981646 0.0016697031 0.001606301
## [2,] -0.0001981646  0.0004857090 0.0001668595 0.001625460
## [3,]  0.0016697031  0.0001668595 0.0129729349 0.014645101
## [4,]  0.0016063014  0.0016254599 0.0146451012 0.041083606

cov.bs <- cov(thetas)
cor.bs <- cor(thetas)

The Bootstrap estimator of the covariance matrix is given by

##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004196823 -0.0001981646 0.0016697031 0.001606301
## [2,] -0.0001981646  0.0004857090 0.0001668595 0.001625460
## [3,]  0.0016697031  0.0001668595 0.0129729349 0.014645101
## [4,]  0.0016063014  0.0016254599 0.0146451012 0.041083606

and the correlation matrix is given by

##            [,1]        [,2]       [,3]      [,4]
## [1,]  1.0000000 -0.43891212 0.71558269 0.3868409
## [2,] -0.4389121  1.00000000 0.06647277 0.3638764
## [3,]  0.7155827  0.06647277 1.00000000 0.6343647
## [4,]  0.3868409  0.36387642 0.63436466 1.0000000

Let’s try using the Hessian of the observed information matrix.

library(numDeriv)
loglike <- function(theta)
{
  s <- 0
  for (x in data)
  {
    s <- s + log(theta[1]*as.numeric(x==0) +
                 theta[2]*dpois(x,theta[3]) +
                 (1-theta[1]-theta[2])*dpois(x,theta[4]))
  }
  s
}
mle <- c(0.1221661,0.5625419,1.4674746,5.9388889)
solve(-hessian(loglike,mle))

##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0003799048 -1.909698e-04 1.438556e-03 0.001184057
## [2,] -0.0001909698  4.657702e-04 7.132638e-05 0.001417409
## [3,]  0.0014385560  7.132638e-05 1.111722e-02 0.011376017
## [4,]  0.0011840568  1.417409e-03 1.137602e-02 0.034664940

statistics R computation EM algorithm statistical inference

SIUe - Computational Statistics (STAT 575) - Problem Set 2

Problem 1

Part (a)

Part (b)

Part (c)

E-step

M-step

Part (d)

Problem 2

Part (a)

E-step

M-step

Part (b)

Part (c)

Alex Towell